Skip to content

Edge-deployed RAG humanoid assistant built with Qwen2.5, llama.cpp, and ChromaDB for low-latency, safety-constrained inference on Apple Silicon.

Notifications You must be signed in to change notification settings

TaherPanbiharwala/xibotix

Repository files navigation

🤖 Manoo — Offline RAG-Powered Humanoid Assistant for Xibotix

Manoo is an offline, domain-specific humanoid assistant designed for Xibotix Private Limited. The system combines local large language model (LLM) inference via llama.cpp with Retrieval-Augmented Generation (RAG) to deliver concise, safety-aware answers about Xibotix, its founders, and its robotic hand–wrist rehabilitation devices (Gyrosphere, ExoFist, ExoCarp).

The project emphasizes edge deployment, low latency, privacy, and controlled generation, making it suitable for real-world humanoid and robotics demonstrations.


✨ Key Features

  • Fully offline LLM inference using llama.cpp
  • Lightweight Model: Qwen2.5-1.5B Instruct (GGUF Q4_K_M) optimized for edge devices
  • RAG Architecture: Retrieval-Augmented Generation using ChromaDB
  • Safety-First: Domain-restricted system prompt for factual consistency
  • API: OpenAI-compatible REST API via llama.cpp server
  • Metrics: Real-time latency measurement (retrieval, generation, end-to-end)
  • Integration Ready: Designed for humanoid robots and embedded systems

🧠 System Architecture

User Query
   |
   v
Python Client (rag_ollama.py)
   |
   +--> ChromaDB (semantic retrieval)
   |
   v
Prompt Assembly
(System Prompt + Retrieved Context + User Query)
   |
   v
llama.cpp Server (Qwen2.5-1.5B GGUF, Metal acceleration)
   |
   v
Generated Response
   |
   v
Latency Metrics + Final Answer

🛠 Technology Stack

LLM & Inference

  • llama.cpp (Metal backend on Apple Silicon)
  • Qwen2.5-1.5B Instruct (GGUF Q4_K_M)

Retrieval

  • ChromaDB (persistent vector database)
  • SentenceTransformers (all-MiniLM-L6-v2)

Backend

  • Python
  • Requests (OpenAI-style API client)

📁 Project Structure

manu_ai_offline/
│
├── rag_ollama.py          # Main RAG pipeline
├── xibotix_db/            # ChromaDB persistent store
├── llama.cpp/             # llama.cpp build + models
└── README.md

🚀 Quick Start

Prerequisites

  • Python 3.9+
  • Apple Silicon Mac (for Metal acceleration)
  • llama.cpp built with Metal support
  • Git LFS (for large model files)

Setup

[git clone https://github.com/your-repo/manu_ai_offline.git](https://github.com/TaherPanbiharwala/xibotix.git)
cd manu_ai_offline

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Start llama.cpp Server

cd llama.cpp/build
./bin/llama-server \
  -m ../models/qwen2.5-1.5b-instruct-q4_k_m.gguf \
  -ngl 99 \
  --port 8080

Run RAG Client

python rag_ollama.py "What is Gyrosphere?"

🔒 Safety Design

Manoo enforces strict guardrails through its system prompt:

  1. No medical diagnosis or prescriptions.
  2. No device parameter recommendations.
  3. Mandatory clinician referral for therapy decisions.
  4. Domain restriction to Xibotix and rehabilitation devices.
  5. No speculative clinical claims.

These rules ensure patient-safe, investor-ready responses.


🎯 Use Cases

  • Humanoid assistant demonstrations
  • Rehab device explanation kiosks
  • Investor presentations
  • Patient-friendly educational interfaces
  • Edge AI benchmarking
  • RAG experimentation on Apple Silicon

📊 Performance Notes

  • Context Window: 4096 tokens
  • Hardware: Metal GPU acceleration enabled
  • Optimization: Prompt caching active in llama.cpp
  • Latency: ~1.5–4s on Apple M1 (Q4_K_M)

⚠️ Limitations

  • Text-only interaction
  • Context window limited to 4096 tokens
  • Knowledge restricted to indexed documents
  • No physical robot control yet

🔮 Future Work

  • Multimodal input (speech + vision)
  • On-device speech synthesis
  • Adaptive context sizing for latency optimization
  • Expanded RAG knowledge base
  • Integration with physical humanoid control systems

👨‍💻 Author

Taher Panbiharwala

About

Edge-deployed RAG humanoid assistant built with Qwen2.5, llama.cpp, and ChromaDB for low-latency, safety-constrained inference on Apple Silicon.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published