Google Summer of Code 2026

List of project ideas for contributors applying to the Google Summer of Code program in 2026 (GSoC 2026).

About CocoIndex

CocoIndex is an ultra-performant data transformation framework for AI, with its core engine written in Rust. It help keeps AI systems fresh and reliable with incremental processing, at any scale.

Repository: github.com/cocoindex-io/cocoindex
Documentation: cocoindex.io/docs
License: Apache 2.0

Timeline/Milestones

Please always refer to the official timeline of Google Summer of code

Application Process

0. Get Familiar with GSoC

First of all, and if you have not done that yet, read the contributor guide which will allow you to understand all this process and how the program works overall. Refer to its left side menu to quickly access sections that may interest you the most, although we recommend you to read everything.

1. Discuss the Project Idea with the Mentor(s)

This is a required step unless you have dived into the existing codebase and understood everything perfectly (very hard) and the idea you prefer is on the list below.

If your idea is not listed, please discuss it with the mentors in the available contact channels. We're always open to new ideas and won't hesitate to choose them if you demonstrate to be a good candidate!

2. Understand That

You're committing to a project and we may ask you to publicly publish your weekly progress on it.
We will ask you to give feedback on our mentorship and guidance.
You wholeheartedly agree with our community values of being inclusive, welcoming, and supportive.
You must tell us if there's any proposed idea that you don't think would fit the timeline or could be boring (yes, we're asking for feedback).

3. Fill Out the Application Form

We recommend you to follow Google's guide to Writing a Proposal as we won't be too harsh on the format and we won't provide any template. But hey, we're giving you a starting point!

You can send the proposal link in any readable format you wish: Google Docs, plain text, markdown... and preferably hosted online, accessible with a common browser without downloading anything.

We highly recommend you to ask for a review anytime from the community or mentor candidates before the contributor application deadline. It's much easier if you get feedback early than to wait for the last moment.

Project Ideas

You can also propose your own ideas!

JavaScript/TypeScript SDK for CocoIndex

Skills: Rust, TypeScript/JavaScript, Node.js, napi-rs or wasm-bindgen, npm packaging

Expected size of the project: Large (~350 hours)

Difficulty rating: Hard

Description:

CocoIndex currently provides a Python SDK that wraps its high-performance Rust core engine. This project aims to bring CocoIndex to the JavaScript/TypeScript ecosystem by building a complete SDK that enables Node.js developers to use CocoIndex's incremental data transformation capabilities.

The project involves three major components:

Rust Bindings for Node.js: Create native Node.js bindings using napi-rs (recommended) or WebAssembly. This mirrors the existing Python bindings and exposes the core engine to JavaScript.
TypeScript/JavaScript Library: Build the high-level SDK that provides an idiomatic JavaScript/TypeScript API, including function decorators/wrappers, type-safe target state declarations, async/await integration, and full TypeScript type definitions.
Node.js Examples: Create example applications demonstrating common use cases (file processing, database sync, etc.) that run on Node.js.

Expected outcomes:

Rust crate with napi-rs bindings exposing the core engine to Node.js
TypeScript package with idiomatic APIs matching Python SDK patterns
10 working examples
npm package published and installable
Documentation for getting started with the JS/TS SDK
CI/CD pipeline for building and testing the bindings

Possible mentors:

George He - georgehe0, cofounder & maintainer of CocoIndex, ex-Google Infra lead
Linghua Jin - badmonster0, cofounder & maintainer of CocoIndex, ex-Google Tech lead

Resources:

napi-rs documentation
Existing Python SDK related code
PyO3 documentation - understand current binding patterns

Incremental Codebase RAG Engine with MCP Server

Skills: Python, code parsing (tree-sitter), vector databases, LLM APIs, MCP (Model Context Protocol)

Expected size of the project: Medium (~175 hours)

Difficulty rating: Medium

Description:

Build an intelligent code understanding engine powered by CocoIndex that extracts, indexes, and maintains a knowledge graph from Python codebases. The engine combines structural code analysis with LLM-powered summarization to enable AI agents to understand and reason about code.

Key capabilities:

Structured Code Extraction: Parse Python codebases using tree-sitter to extract coarse-grained entities (classes and functions) and their relationships (imports, calls). Extract existing docstrings and comments.
LLM Summarization: Generate summaries for classes and functions using LLM APIs, providing semantic understanding of what each code component does.
Incremental Updates: Leverage CocoIndex's incremental processing to efficiently update the knowledge graph when code changes—only re-parse and re-summarize modified entities.
MCP Server for AI Agents: Expose the indexed knowledge through a Model Context Protocol (MCP) server with a focused set of tools (e.g., search entities, get entity details, list relationships, get file overview).

Expected outcomes:

Code parsing pipeline for Python using tree-sitter
Knowledge graph capturing classes, functions, relationships, and summaries
Incremental indexing that efficiently handles code changes
MCP server with 3-4 essential tools for code understanding
Example integration showing the MCP server working with an AI agent
Documentation and usage guide

Optional/Stretch goals:

Additional language support (TypeScript, Rust)
Hierarchical aggregation (module-level summaries derived from their components)

Possible mentors:

George He - georgehe0, cofounder & maintainer of CocoIndex, ex-Google Infra lead
Linghua Jin - badmonster0, cofounder & maintainer of CocoIndex, ex-Google Tech lead

Resources:

Tree-sitter - incremental parsing library
Model Context Protocol (MCP) - protocol for AI tool integration
CocoIndex documentation - understanding incremental processing

Incremental GraphRAG Implementation

Skills: Python, graph algorithms, LLM APIs, understanding of RAG systems

Expected size of the project: Medium (~175 hours)

Difficulty rating: Medium

Description:

Implement Microsoft's GraphRAG technique using CocoIndex, creating a GraphRAG system that supports incremental processing. GraphRAG enhances retrieval-augmented generation by building a knowledge graph from documents—extracting entities and relationships, detecting communities, and generating summaries at multiple levels of abstraction.

GraphRAG involves multiple processing stages with different incrementalization characteristics:

Easy to incrementalize: Entity/relationship extraction, text chunking, and per-chunk processing can run independently on each input document.
Hard to incrementalize: Community detection and global summarization require a holistic view of the entire graph, making incremental updates challenging.

Scope:

Implement the full GraphRAG pipeline with CocoIndex. Incrementalize the straightforward stages (chunking, entity extraction, relationship extraction)—this is where most processing cost lies. For global stages (community detection, hierarchical summarization), rerun the entire stage when any input changes. Expose query capabilities through a simple MCP server.

Expected outcomes:

Full GraphRAG pipeline implementation using CocoIndex
Incremental processing for document-level stages (entity/relationship extraction)
Working global stages (community detection, summarization) that rerun on changes
Simple MCP server with tools for local and global GraphRAG queries
Documentation and example usage

Optional/Stretch goals:

Advanced incremental processing for global stages (may require CocoIndex engine work)
Performance benchmarks comparing incremental vs. full reprocessing

Possible mentors:

George He - georgehe0, cofounder & maintainer of CocoIndex, ex-Google Infra lead
Linghua Jin - badmonster0, cofounder & maintainer of CocoIndex, ex-Google Tech lead

Resources:

GraphRAG Paper - original research paper
Microsoft GraphRAG - reference implementation
CocoIndex documentation - understanding incremental processing
Model Context Protocol (MCP) - protocol for AI tool integration

Benchmarking Framework for CocoIndex

Skills: Python, performance profiling, data visualization, CI/CD pipelines

Expected size of the project: Medium (~175 hours)

Difficulty rating: Medium

Description:

Create a comprehensive benchmarking framework for CocoIndex that enables measuring, comparing, and reporting performance across different use cases. The framework serves three distinct audiences with different needs:

Use cases:

Engine Evaluation (for engine developers): Benchmark the CocoIndex core engine using curated application code, curated input datasets, and curated connectors. This helps measure engine performance improvements across releases and identify regressions.
Connector Evaluation (for connector developers): Benchmark specific connectors using curated application code and input data. Includes glue code adapters that feed standardized test data into each connector, enabling fair comparisons between connector implementations.
Application Evaluation (for application developers): Provide reusable infrastructure so developers can benchmark their own applications with their own data, using the same tooling and reporting capabilities.

Key components:

Benchmark Suite: Curated set of representative workloads (small/medium/large datasets, various processing patterns)
Runner Infrastructure: Automated execution with resource monitoring (CPU, memory, I/O, time)
Scoring System: Standardized metrics for throughput, latency, incremental update efficiency, and resource usage
Reporting: Generate human-readable reports and machine-readable outputs for CI integration

Expected outcomes:

Benchmark suite with curated applications, datasets, and connector adapters
CLI tool for running benchmarks locally
Scoring system with well-defined metrics
Report generation (HTML reports, JSON output for CI)
Documentation for using the framework and adding new benchmarks

Optional/Stretch goals:

CI integration for automated performance regression testing
Historical tracking and trend visualization
Comparison mode for A/B testing engine or connector changes

Possible mentors:

George He - georgehe0, cofounder & maintainer of CocoIndex, ex-Google Infra lead
Linghua Jin - badmonster0, cofounder & maintainer of CocoIndex, ex-Google Tech lead

Resources:

CocoIndex documentation
CocoIndex examples
pytest-benchmark - Python benchmarking reference
Criterion.rs - Rust benchmarking patterns

Connector Testing Infrastructure

Skills: Python, testing frameworks, understanding of incremental processing and state management

Expected size of the project: Medium (~175 hours)

Difficulty rating: Medium

Description:

Build a comprehensive testing infrastructure for CocoIndex connectors. CocoIndex is an incremental processing engine—connectors bridge "states" and "changes" while application developers only think in terms of desired states. This means connector implementations must correctly handle a variety of state transition scenarios and edge cases that application developers never see.

Testing scenarios to cover:

Basic state transitions:
- Items added (new entries appear in source)
- Items deleted (entries removed from source)
- Items updated (entries modified in source)
- Mixed operations (combinations of add/delete/update in a single run)
Edge cases and failure modes:
- Idempotency: If an interrupt occurs between committing output to the target and committing metadata to CocoIndex's internal storage, the next run will re-trigger the same commit—connectors must handle this gracefully
- Partial failures: Some items succeed, others fail
- Empty states: No items, all items deleted
- Large batches: Many items changing at once
Incremental correctness:
- Verify that incremental updates produce the same final state as full reprocessing
- Detect state drift over multiple incremental runs

Key components:

Test Harness: Framework for simulating state changes and running connectors through test scenarios
Scenario Library: Pre-built test scenarios covering common and edge cases
Fault Injection: Tools to simulate failures (interrupts, partial commits, network errors)
State Verification: Utilities to compare expected vs. actual target state after operations

Expected outcomes:

Reusable test harness for connector developers
Library of test scenarios (basic transitions, edge cases, failure modes)
Fault injection utilities for testing idempotency and recovery
Documentation and examples for testing new connectors
Tests applied to existing connectors as validation

Optional/Stretch goals:

Property-based testing for generating random state transition sequences
Integration with CI for automated connector testing

Possible mentors:

George He - georgehe0, cofounder & maintainer of CocoIndex, ex-Google Infra lead
Linghua Jin - badmonster0, cofounder & maintainer of CocoIndex, ex-Google Tech lead

Resources:

Find Us

Join our community to discuss project ideas, get help, and connect with mentors:

Discord:
GitHub: github.com/cocoindex-io/cocoindex
Twitter/X: @cocoindex_io
YouTube: youtube.com/@cocoindex-io
Email: hi@cocoindex.io

Getting Started with CocoIndex

Before applying, we recommend familiarizing yourself with CocoIndex:

Read the documentation: cocoindex.io/docs
Try the quickstart: Getting Started Guide
Watch tutorials: YouTube Channel
Explore examples: github.com/cocoindex-io/cocoindex/tree/main/examples
Join our Discord: Ask questions and introduce yourself!

CocoIndex – Contributor Proposal Guidance

This page explains how to write a strong GSoC proposal for CocoIndex, what we expect it to include, and how to get in touch with mentors.

Before you write a proposal

We strongly prefer contributors who have already interacted with CocoIndex a bit:

Read the CocoIndex docs and quickstart to understand what the project does and where your skills fit.
Explore the GitHub repo (code layout, issues, examples).
Join our public communication channels and introduce yourself.
Make at least one small contribution - if possible (docs, tests, or a “good first issue” PR).

What to include in your proposal

We recommend the following structure.

Title and project idea
- A short, descriptive title that includes “CocoIndex” and the idea name (for example, “CocoIndex: Incremental connector for X”).
- Link to the idea from our Ideas page or clearly mark it as a self‑proposed idea.
About you
- Name, email, GitHub, time zone, and expected weekly availability.
- Briefly describe your relevant experience: Rust, Python, data pipelines, databases, or AI/ML.
- Link to any open‑source work (including CocoIndex contributions, if any).
Project overview and motivation
- 3–5 sentences explaining what you want to build, who it helps, and why it matters for CocoIndex.
- In your own words, describe the problem you are solving and show that you’ve read the relevant docs/code.
Technical plan
- Break the work into clear phases (design, prototype, implementation, tests, docs, examples).
- For each phase, describe your approach: which components of CocoIndex you’ll touch, technologies used, and any initial design ideas.
- Mention risks or unknowns and how you plan to de‑risk them (spikes, early prototypes, mentor check‑ins).
Timeline and milestones
- Provide a week‑by‑week or phase‑by‑phase schedule that aligns with the GSoC timeline.
- List specific deliverables by the midterm evaluation and by the final evaluation (e.g., “connector supports basic CRUD and passes tests,” “benchmark suite for N scenarios,” “example notebook published”).
- Note any periods when you’ll be unavailable (exams, holidays, etc.).
Communication plan
- How often you plan to send progress updates (we expect at least two public updates per week plus a weekly mentor meeting).
- Your preferred communication channels (GitHub Discussions, chat, email) and how you’ll ask for help when blocked.
After GSoC
- A short paragraph on how you’d like to continue contributing to CocoIndex after the program (maintaining your feature, fixing bugs, writing docs, or mentoring new contributors).

Prerequisites

We generally expect the following before we accept a proposal:

You have successfully built and run CocoIndex locally (Rust and/or Python as appropriate).
You have made at least one small public contribution to CocoIndex (or a clearly related repo), or you have had a substantial technical discussion with mentors about your idea.
You have discussed your draft proposal with a mentor and refined scope based on their feedback.

How to get in touch

Check the CocoIndex GSoC page (linked from our README and docs) for:
- Current GSoC ideas and mentors.
- Links to our communication channels.
Start by posting in the public channel or idea discussion with:
- A short intro,
- The idea you’re considering, and
- Any initial questions or draft plans.

Please also read the official GSoC guides on writing and submitting proposals, and submit your proposal early so mentors have time to review and give feedback before the deadline.

We're excited to welcome GSoC contributors to the CocoIndex community!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Summer of Code 2026

About CocoIndex

Timeline/Milestones

Application Process

0. Get Familiar with GSoC

1. Discuss the Project Idea with the Mentor(s)

2. Understand That

3. Fill Out the Application Form

Project Ideas

JavaScript/TypeScript SDK for CocoIndex

Incremental Codebase RAG Engine with MCP Server

Incremental GraphRAG Implementation

Benchmarking Framework for CocoIndex

Connector Testing Infrastructure

Find Us

Getting Started with CocoIndex

CocoIndex – Contributor Proposal Guidance

Before you write a proposal

What to include in your proposal

Prerequisites

How to get in touch

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

cocoindex-io/gsoc-2026

Folders and files

Latest commit

History

Repository files navigation

Google Summer of Code 2026

About CocoIndex

Timeline/Milestones

Application Process

0. Get Familiar with GSoC

1. Discuss the Project Idea with the Mentor(s)

2. Understand That

3. Fill Out the Application Form

Project Ideas

JavaScript/TypeScript SDK for CocoIndex

Incremental Codebase RAG Engine with MCP Server

Incremental GraphRAG Implementation

Benchmarking Framework for CocoIndex

Connector Testing Infrastructure

Find Us

Getting Started with CocoIndex

CocoIndex – Contributor Proposal Guidance

Before you write a proposal

What to include in your proposal

Prerequisites

How to get in touch

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages