🎙️ Aura Voice AI

Aura is a real-time, voice-to-voice AI companion that listens, thinks, and speaks back naturally like humans.

🏗️ Architecture

Flow: User speaks → Audio is recorded and sent via WebSocket to server → STT transcribes → LLM generates response → TTS streams audio back → Audio is played on mobile app → User hears response

🚀 How to Run

Prerequisites

Node.js 18+
Expo Go Mobile App
API Keys: GROQ_API_KEY, ELEVENLABS_API_KEY

Backend

cd backend
npm install
# Create .env file with:
# GROQ_API_KEY=your_key
# ELEVENLABS_API_KEY=your_key
# PORT=5000
npm run dev

Mobile App

cd mobile-app
npm install
# Create .env file with:
# EXPO_PUBLIC_WEBSOCKET_URL=ws://localhost:5000
npm run start

# Open Expo Go app on your phone and scan the QR code from the terminal
# Start using the app!

⚡ Latency Approach

What is Optimized?

Optimization	Why
Groq LLM (Llama 3.3 70B)	Fastest inference provider (~200-400ms for short responses)
ElevenLabs Scribe v2	Low-latency STT model optimized for real-time
Streaming TTS	Audio chunks sent as they're generated, not waiting for full synthesis
WebSocket (persistent)	Eliminates HTTP connection overhead per request
Raw WebSocket library (ws)	Eliminates the overhead and abstraction of higher-level libraries eg. Socket.io
Non-streaming LLM for short responses	For voice AI, full response is often faster than streaming overhead
Binary audio over WebSocket	Minimal encoding overhead for audio data
Low-quality recording preset	16kHz mono @ 128kbps — fast to encode & transmit

Latency Breakdown

Stage	Typical Time
Audio Recording + Send	~100-200ms
STT (ElevenLabs Scribe)	~400-800ms
LLM Response (Groq)	~200-500ms
TTS First Chunk	~500-2000ms
Total (Time to First Audio)	Upto 5s

Avg Latency: 2-3s

📊 Latency Measurement

Latency is measured end-to-end from recording stop to first audio playback programmatically in the mobile app:

Displayed in UI: Real-time latency shown after each interaction.

🔮 Future Improvements

Given more time, I would:

Implement streaming LLM → TTS pipeline — Stream sentences to TTS as LLM generates them (already have groqChatStream ready)
Add VAD (Voice Activity Detection) — Auto-detect speech end instead of push-to-talk
Client-side audio chunking — Stream audio during recording for faster STT start
Optimize latency further — Identify and reduce bottlenecks in each stage carefully
Audio compression — Implement more efficient audio codecs for lower bandwidth
Preemptive TTS warming — Pre-initialize TTS connection to reduce first-chunk latency
Edge deployment — Deploy backend closer to user for reduced network latency

📁 Project Structure

├── backend/                 # Node.js WebSocket server
│   └── src/
│       ├── index.ts         # WebSocket server & message handling
│       ├── constant.ts      # System prompt for Aura personality
│       └── services/
│           ├── stt.ts       # ElevenLabs Speech-to-Text
│           ├── llm.ts       # Groq LLM (Llama 3.3)
│           ├── tts.ts       # ElevenLabs Text-to-Speech (streaming)
│           └── context.ts   # Conversation history management
│
└── mobile-app/              # Expo React Native app
    └── app/
        └── index.tsx        # Main voice interface

🛠️ Tech Stack

Mobile: Expo, React Native, expo-audio
Backend: Node.js, WebSocket (ws), TypeScript
STT: ElevenLabs Scribe v2
LLM: Groq (Llama 3.3 70B / GPT OSS)
TTS: ElevenLabs Multilingual v2 (Streaming)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
backend		backend
mobile-app		mobile-app
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Aura Voice AI

🏗️ Architecture

🚀 How to Run

Prerequisites

Backend

Mobile App

⚡ Latency Approach

What is Optimized?

Latency Breakdown

Avg Latency: 2-3s

📊 Latency Measurement

🔮 Future Improvements

📁 Project Structure

🛠️ Tech Stack

About

Uh oh!

Languages

aman-tiwari001/Realtime-AI-Voice-Chat-App

Folders and files

Latest commit

History

Repository files navigation

🎙️ Aura Voice AI

🏗️ Architecture

🚀 How to Run

Prerequisites

Backend

Mobile App

⚡ Latency Approach

What is Optimized?

Latency Breakdown

Avg Latency: 2-3s

📊 Latency Measurement

🔮 Future Improvements

📁 Project Structure

🛠️ Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages