semantic-chunking

Star

Here are 27 public repositories matching this topic...

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Oct 28, 2025
Python

mirth / chonky

Star

Fully neural approach for text chunking

ai ml chunking rag text-splitter llms semantic-chunking

Updated Oct 23, 2025
Python

Unsiloed-AI / Unsiloed-Parser

Star

python ocr openai yolo chunking hacktoberfest pdf-parsing ai-agents document-processing rag llm semantic-chunking

Updated Oct 25, 2025
Python

jparkerweb / semantic-chunking

Star

🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking

Updated Feb 3, 2026
JavaScript

mburaksayici / RAG-Boilerplate

Star

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

Updated Nov 18, 2025
Python

prajwal10001 / semantic-chunker-langchain

Star

Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support

python nlp markdown pdf ai rag langchain semantic-chunking

Updated Jun 28, 2025
Python

jparkerweb / llm-distillery

Star

🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.

text-summarization text-processing tokenization text-compression token-management openai-api llm large-language-model semantic-chunking text-distillation ai-text-reduction

Updated Oct 15, 2025
JavaScript

ThanhHung2112 / Semantic_chunking

Star

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

nlp text vector chunking rag text-split vector-database semantic-chunking

Updated Dec 15, 2024
Python

bazilicum / axonode-chunker

Sponsor

Star

Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.

lang rag test-split langchain semantic-chunking text-spl

Updated Aug 10, 2025
Python

zircote / rlm-rs

Star

Rust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows through intelligent chunking, SQLite persistence, and recursive sub-LLM orchestration.

Updated Feb 3, 2026
Rust

darkzard05 / rag-system-ollama

Star

Advanced local-first RAG system powered by Ollama and LangGraph. Optimized for high-performance sLLM orchestration featuring adaptive intent routing, semantic chunking, intelligent hybrid search (FAISS + BM25), and real-time thought streaming. Includes integrated PDF analysis and secure vector caching.

python nlp semantic-search reranking faiss rag fastapi streamlit vector-database hybrid-search langchain pdf-chat local-ai ollama semantic-chunking langgraph ai-orchestration sllm thought-streaming

Updated Feb 6, 2026
Python

njyeung / go-semantic-chunking

Star

Sementic chunking algorithm in (mostly) Go

vector embeddings chunking semantic-segmentation text-splitter text-chunking semantic-chunking retreival-augmented-generation

Updated Feb 6, 2026
Go

YUALAB / bunsetsu

Star

Japanese-optimized semantic text chunking for RAG applications

python nlp japanese mecab rag llm langchain llamaindex text-chunking semantic-chunking

Updated Dec 27, 2025
Python

url4irl / vectors-gateway

Star

A Sidecar service for applications that need vector database functionality to augment their LLMs. This service provides embeddings and retrieval capabilities by abstracting embeddings generation (LiteLLM) and vector storage and search (Qdrant).

embeddings vectors sidecar rag qdrant litellm semantic-chunking

Updated Dec 8, 2025
TypeScript

Jayandhan03 / HR-Asst-rag

Star

HR Policy Assistant (RAG-based Chatbot) A conversational AI assistant for employees to query company HR policies. Built with LangChain and Qdrant, it semantically ingests HR documents, retrieves relevant policy information, reranks results with BM25/MMR, and delivers precise LLM-generated responses.Cloud-based vector storage ensure quick responses.

streamlit-webapp dense-retrieval huggingface-spaces langchain hybrid-retrieval qdrant-vector-database semantic-chunking rag-chatbot

Updated Oct 15, 2025
Python

ProfEngel / OpenTuneWeaver

Sponsor

Star

All in One-Solution for converting documents to finetune LLMs

benchmarking ai lora dataset-generation quantization all-in-one gradio model-deployment finetuning pdf-processing qa-generation personal-ai llm vllm qlora gguf semantic-chunking educational-ai opentuneweaver

Updated Sep 26, 2025
Python

Arnav-Ajay / rag-chunking-strategies

Star

A controlled study showing how different chunking strategies change which questions are even representable in retrieval-augmented generation systems—independent of retrieval quality.

information-retrieval evaluation control-systems chunking rag retrieval-augmented-generation semantic-chunking fixed-chunk structural-chunking

Updated Jan 23, 2026
Python

IcHiGo-KuRoSaKiI / Chomper

Star

Chomper - Chomp through any document. MCP server for parsing 36+ file formats with semantic chunking & TOON token optimization for Claude and AI systems.

python mcp xlsx text-extraction epub docx pdf-parser claude document-parser rag ai-tools llm anthropic semantic-chunking model-context-protocol mcp-server

Updated Jan 23, 2026
Python

pipewrk / llm-core

Star

Lightweight, composable TypeScript library for semantic chunking, workflow pipelining, and LLM orchestration.

nlp typescript pipeline embeddings openai cosine-similarity chunking data-processing bun llm ollama semantic-chunking

Updated Sep 17, 2025
TypeScript

ParthMandaliya / wikipedia-rag

Star

Advanced RAG system combining semantic chunking, ChromaDB vector store, and knowledge graphs. Built on unofficial HuggingFace's HotPotQA (https://huggingface.co/datasets/ParthMandaliya/hotpot_qa) dataset for multi-hop question answering.

knowledge-graph rag multi-hop-reasoning hotpotqa multi-hop-question-answering semantic-chunking rag-chatbot

Updated Feb 4, 2026
Python

Improve this page

Add a description, image, and links to the semantic-chunking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the semantic-chunking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-chunking

Here are 27 public repositories matching this topic...

isaacus-dev / semchunk

mirth / chonky

Unsiloed-AI / Unsiloed-Parser

jparkerweb / semantic-chunking

mburaksayici / RAG-Boilerplate

prajwal10001 / semantic-chunker-langchain

jparkerweb / llm-distillery

ThanhHung2112 / Semantic_chunking

bazilicum / axonode-chunker

zircote / rlm-rs

darkzard05 / rag-system-ollama

njyeung / go-semantic-chunking

YUALAB / bunsetsu

url4irl / vectors-gateway

Jayandhan03 / HR-Asst-rag

ProfEngel / OpenTuneWeaver

Arnav-Ajay / rag-chunking-strategies

IcHiGo-KuRoSaKiI / Chomper

pipewrk / llm-core

ParthMandaliya / wikipedia-rag

Improve this page

Add this topic to your repo