Manoo is an offline, domain-specific humanoid assistant designed for Xibotix Private Limited. The system combines local large language model (LLM) inference via llama.cpp with Retrieval-Augmented Generation (RAG) to deliver concise, safety-aware answers about Xibotix, its founders, and its robotic hand–wrist rehabilitation devices (Gyrosphere, ExoFist, ExoCarp).
The project emphasizes edge deployment, low latency, privacy, and controlled generation, making it suitable for real-world humanoid and robotics demonstrations.
- Fully offline LLM inference using
llama.cpp - Lightweight Model: Qwen2.5-1.5B Instruct (GGUF Q4_K_M) optimized for edge devices
- RAG Architecture: Retrieval-Augmented Generation using ChromaDB
- Safety-First: Domain-restricted system prompt for factual consistency
- API: OpenAI-compatible REST API via
llama.cppserver - Metrics: Real-time latency measurement (retrieval, generation, end-to-end)
- Integration Ready: Designed for humanoid robots and embedded systems
User Query
|
v
Python Client (rag_ollama.py)
|
+--> ChromaDB (semantic retrieval)
|
v
Prompt Assembly
(System Prompt + Retrieved Context + User Query)
|
v
llama.cpp Server (Qwen2.5-1.5B GGUF, Metal acceleration)
|
v
Generated Response
|
v
Latency Metrics + Final Answer
- llama.cpp (Metal backend on Apple Silicon)
- Qwen2.5-1.5B Instruct (GGUF Q4_K_M)
- ChromaDB (persistent vector database)
- SentenceTransformers (
all-MiniLM-L6-v2)
- Python
- Requests (OpenAI-style API client)
manu_ai_offline/
│
├── rag_ollama.py # Main RAG pipeline
├── xibotix_db/ # ChromaDB persistent store
├── llama.cpp/ # llama.cpp build + models
└── README.md- Python 3.9+
- Apple Silicon Mac (for Metal acceleration)
- llama.cpp built with Metal support
- Git LFS (for large model files)
[git clone https://github.com/your-repo/manu_ai_offline.git](https://github.com/TaherPanbiharwala/xibotix.git)
cd manu_ai_offline
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtcd llama.cpp/build
./bin/llama-server \
-m ../models/qwen2.5-1.5b-instruct-q4_k_m.gguf \
-ngl 99 \
--port 8080python rag_ollama.py "What is Gyrosphere?"Manoo enforces strict guardrails through its system prompt:
- No medical diagnosis or prescriptions.
- No device parameter recommendations.
- Mandatory clinician referral for therapy decisions.
- Domain restriction to Xibotix and rehabilitation devices.
- No speculative clinical claims.
These rules ensure patient-safe, investor-ready responses.
- Humanoid assistant demonstrations
- Rehab device explanation kiosks
- Investor presentations
- Patient-friendly educational interfaces
- Edge AI benchmarking
- RAG experimentation on Apple Silicon
- Context Window: 4096 tokens
- Hardware: Metal GPU acceleration enabled
- Optimization: Prompt caching active in
llama.cpp - Latency: ~1.5–4s on Apple M1 (Q4_K_M)
- Text-only interaction
- Context window limited to 4096 tokens
- Knowledge restricted to indexed documents
- No physical robot control yet
- Multimodal input (speech + vision)
- On-device speech synthesis
- Adaptive context sizing for latency optimization
- Expanded RAG knowledge base
- Integration with physical humanoid control systems
Taher Panbiharwala