Production-ready MCP server with RAG, memory, and tools
Stop rebuilding the same infrastructure. Connect any AI agent to long-term memory, document retrieval, and 8+ powerful tools through the Model Context Protocol.
Building AI agents? You keep reinventing:
- Long-term memory that persists across sessions
- Document retrieval (RAG) for knowledge access
- Tool integration (web search, vision, code execution, browser automation)
Every project starts from scratch. Every agent rebuilds the wheel.
A-Modular-Kingdom is the infrastructure layer you're missing:
# Start the MCP server
python src/agent/host.pyNow any agent (Claude Desktop, custom chatbots, multi-agent systems) gets instant access to:
- β Hierarchical memory (global rules, project context)
- β 3 RAG implementations (v1/v2/v3) for document search
- β 8 production-ready tools via MCP protocol
One foundation. Infinite applications.
- β¨ Core Features
- π Quick Start
- π οΈ Available Tools
- π RAG System
- π§ Memory System
- π¦ Package Installation
- π― Integration Examples
- π€ Example Applications
- π€ Contributing
- MCP Protocol - Standard interface for AI tool access
- 3 RAG Versions - Choose your retrieval strategy (FAISS, Qdrant, custom)
- Scoped Memory - Global rules, preferences, project-specific context
- 8+ Tools - Vision, code exec, browser, web search, TTS/STT, and more
- No Vendor Lock-in - Local Ollama models, open-source stack
- Production Ready - Smart reindexing, Unicode support, error handling
# Required
Python 3.10+
Ollama (for embeddings: ollama pull embeddinggemma)
# Optional
UV package manager (faster than pip)# Clone the repository
git clone https://github.com/MasihMoafi/A-Modular-Kingdom.git
cd A-Modular-Kingdom
# Install dependencies
uv sync
# or: pip install -r requirements.txt
# Pull required Ollama model
ollama pull embeddinggemma# Start host.py MCP server
python src/agent/host.pyOption 1: Claude Desktop
// Add to claude_desktop_config.json
{
"mcpServers": {
"a-modular-kingdom": {
"command": "python",
"args": ["/full/path/to/A-Modular-Kingdom/src/agent/host.py"]
}
}
}Option 2: Interactive Client
# Use the included chat interface
python src/agent/main.pyOption 3: Custom Integration
# Connect via MCP in your own agent
from mcp import StdioServerParameters
server_params = StdioServerParameters(
command="python",
args=["/path/to/host.py"]
)
# Use with ToolCollection.from_mcp(server_params)The MCP server exposes these tools:
| Tool | Description | Use Case |
|---|---|---|
query_knowledge_base |
RAG search (v1/v2/v3) | "How does auth work in this codebase?" |
save_memory |
Scoped memory storage | Save global rules or project context |
search_memories |
Semantic memory search | Retrieve past decisions/preferences |
web_search |
DuckDuckGo search | Current events, latest docs |
browser_automation |
Playwright web scraping | Extract text/screenshot from URLs |
code_execute |
Safe Python sandbox | Run code in isolated environment |
analyze_media |
Vision with Ollama | Analyze images/videos |
text_to_speech |
TTS (pyttsx3/kokoro) | Generate audio from text |
speech_to_text |
Whisper STT | Transcribe audio files |
Three implementations with different trade-offs:
- Stack: FAISS + BM25
- Speed: <1s
- Use Case: Small projects, quick prototypes
- Stack: Qdrant + BM25 + CrossEncoder reranking
- Speed: <1s with smart caching
- Use Case: Production apps, large codebases
- Features: Smart reindexing, cloud-ready
- Stack: Custom vector index + BM25 + RRF fusion + LLM reranking
- Speed: 2-3s (LLM reranking overhead)
- Use Case: Research, maximum accuracy
- Features: Contextual retrieval, custom distance metrics
Usage:
# Via MCP tool
query_knowledge_base(
query="How does authentication work?",
version="v2", # or "v1", "v3"
doc_path="./src" # optional
)Supported Files: .py, .md, .txt, .pdf, .ipynb, .js, .ts
Hierarchical scoped memory with automatic categorization:
| Scope | Persistence | Use Case |
|---|---|---|
| Global Rules | Forever, all projects | "Always use type hints" |
| Global Preferences | Forever, all projects | "Prefer dark mode" |
| Global Personas | Forever, all projects | Reusable agent personalities |
| Project Context | Current project | Architecture decisions, tech stack |
| Project Sessions | Temporary | Current task, recent changes |
# Save with explicit scope
save_memory(content="Always validate user input", scope="global_rules")
# Or use prefix shortcuts
save_memory(content="#global:rule:Never use eval()")
save_memory(content="#project:context:Uses FastAPI backend")
# Auto-inference from keywords
save_memory(content="User prefers Python 3.12") # β global_preferences
# Search with priority (global β project)
search_memories(query="coding standards", top_k=5)Storage: ~/.modular_kingdom/memories/ (global) + project-specific folders
The MCP server can also be installed as a standalone package:
# Install with sentence-transformers (no Ollama required)
pip install rag-mem[local]
# Set embedding provider (add to your shell profile or script)
export MEMORY_MCP_EMBED_PROVIDER=sentence-transformers
export MEMORY_MCP_EMBED_MODEL=all-MiniLM-L6-v2Python API:
from memory_mcp.config import Settings
from memory_mcp.rag import RAGPipeline
from memory_mcp.memory import MemoryStore
# RAG - index and search any codebase
pipeline = RAGPipeline(Settings(), document_paths=["./src"])
pipeline.index()
results = pipeline.search("how does authentication work")
# Memory - persistent storage across sessions
store = MemoryStore(Settings())
store.add("User prefers dark mode")
results = store.search("preferences")CLI Usage:
memory-mcp init # Initialize config
memory-mcp serve --docs ./documents # Start MCP server
memory-mcp index ./path/to/files # Index documentsAlternative: Use Ollama (local, private)
pip install rag-mem
ollama pull nomic-embed-text
# No env vars needed - Ollama is the defaultPackage Size: 58KB code (note: ~2GB dependencies with PyTorch)
Already using Claude Code? Add A-Modular-Kingdom tools:
{
"mcpServers": {
"a-modular-kingdom": {
"command": "python",
"args": ["/path/to/src/agent/host.py"]
}
}
}Now Claude has access to your codebase RAG, persistent memory, and all tools.
// gemini-extension.json
{
"mcpServers": {
"unified_knowledge_agent": {
"command": "python",
"args": ["/path/to/src/agent/host.py"]
}
}
}from smolagents import ToolCallingAgent, ToolCollection
from mcp import StdioServerParameters
# Connect to MCP server
params = StdioServerParameters(
command="python",
args=["/path/to/host.py"]
)
with ToolCollection.from_mcp(params) as tools:
agent = ToolCallingAgent(tools=list(tools.tools))
result = agent.run("Search the codebase for auth logic")This repository includes example multi-agent systems built on the foundation:
- 3-tier agent hierarchy (Queen β Teacher β Code Agent)
- Validation loops and task delegation
- Uses ACP SDK + smolagents
- Location:
multiagents/council_chamber/
- Fitness planning workflow (Interview β Plan β Nutrition)
- CrewAI-powered coordination
- Web interface included
- Location:
multiagents/gym/
Note: These are demonstration applications, not the core product. The foundation (host.py) is the main offering.
βββββββββββββββββββββββββββββββββββββββ
β Your AI Application β
β (Agents, Chatbots, Workflows) β
ββββββββββββββ¬βββββββββββββββββββββββββ
β MCP Protocol
ββββββββββββββΌβββββββββββββββββββββββββ
β A-Modular-Kingdom β
β βββββββββββ βββββββββββ βββββββββββ
β β RAG β β Memory β β Tools ββ
β β V1/V2/V3β β Scoped β β 8+ ββ
β βββββββββββ βββββββββββ βββββββββββ
β host.py (MCP Server) β
βββββββββββββββββββββββββββββββββββββββ
# Run all tests
pytest tests/ -v
# Run specific test suites
pytest tests/test_rag_v2.py -v
pytest tests/test_rag_v3.py -v
pytest tests/test_memory_global.py -v
# Run benchmarks
python tests/benchmark_rag.pyBenchmark Results (GPU/CUDA):
| Version | Docs | Cold Start | Warm Query |
|---|---|---|---|
| V2 | 100 | 26.8s | 0.31s |
| V3 | 100 | 13.9s | 0.02s (15x faster!) |
Key Features:
- β GPU acceleration (CUDA) for embeddings and reranking
- β Smart caching (warm queries <0.5s)
- β Tested with .py, .md, .txt, .ipynb files
- β Global memory access from any directory
See detailed benchmarks: docs/RAG_PERFORMANCE.md
Package verified to work in isolation:
docker build -f Dockerfile.test -t rag-mem-test .
docker run --rm rag-mem-testContributions welcome! Focus areas:
- Additional RAG strategies - New retrieval techniques
- New tool integrations - Expand MCP tool offerings
- Performance optimizations - Speed improvements
- Documentation improvements - Tutorials, examples
# Fork and clone
git clone https://github.com/MasihMoafi/A-Modular-Kingdom.git
cd A-Modular-Kingdom
# Create branch
git checkout -b feature/your-feature
# Install dev dependencies
uv sync
# Make changes and test
pytest tests/
# Commit with descriptive message
git commit -m "feat: add new tool"
# Push and create PR
git push origin feature/your-featureMIT License - See LICENSE for details
- Medium Article: https://medium.com/@masihmoafi12/a-modular-kingdom-fcaa69a6c1f0
- Demo Video: https://www.youtube.com/watch?v=hWoQnAr6R_E
- PyPI Package: rag-mem
A-Modular-Kingdom: The infrastructure layer AI agents deserve π°