Conversational AI with Amazon Nova 2 Sonic

A sample implementation of real-time voice assistant using Amazon Nova 2 Sonic, AWS's speech-to-speech foundation model. Talk naturally with an AI that listens, thinks, and responds — all in real-time with human-like conversation flow.

Features

Unified Speech-to-Speech Model

Unlike traditional pipelines (ASR → LLM → TTS), a speech-to-speech model processes audio input and generates audio output directly in a single model, preserving acoustic context like tone, pace, and emotion.

Natural Conversation Flow

Turn-Taking — Detects pauses, hesitations, and non-verbal cues to know when to respond
Barge-In — Gracefully handles interruptions mid-response without losing context
Adaptive Speech — Adjusts response tone and pace based on how you speak, not just what you say
Low Latency — Bidirectional streaming delivers responses as they're generated

Voice & Language

Multiple Voices — Tiffany (feminine) and Matthew (masculine) with native expressivity
Polyglot Support — Voices can switch languages within the same conversation
Text Input Mode — Switch to typing when voice isn't convenient (crossmodal interaction)
Languages — Officially supports English (US/UK), Spanish, French, German, Italian, Portuguese, and Hindi. Other languages like Chinese and Japanese should also work.

Tool Calling

Nova 2 Sonic supports asynchronous function calling — the model can invoke tools while continuing the conversation, then incorporate results naturally when they return. See src/tools for implementation. You can enable each of these tools individually in UI.

Tool	Description
DateTime	Current date and time
Weather	Weather forecasts via Open-Meteo API
Wikipedia	Search and retrieve Wikipedia articles
Location Search	Find places and coordinates
Reasoning	Extended thinking for complex questions
Transcript Correction	Fix ASR transcription errors

Architecture

┌─────────────────┐     Socket.IO     ┌─────────────────┐    AWS Bedrock    ┌─────────────────┐
│   Browser UI    │◄─────────────────►│  Express Server │◄─────────────────►│  Nova 2 Sonic   │
│   (JavaScript)  │   WebSocket       │   (TypeScript)  │   Bidirectional   │  Model          │
└─────────────────┘                   └─────────────────┘   Streaming       └─────────────────┘

Audio is captured via Web Audio API (16kHz mono PCM), streamed through Socket.IO to an Express server, then forwarded to Nova 2 Sonic via AWS Bedrock's bidirectional streaming API. Responses (24kHz audio + text transcripts) flow back in real-time.

Screenshots

Quick Start

Prerequisites

Node.js 18+
AWS account with Amazon Bedrock access to Nova 2 Sonic
AWS credentials configured

Setup

npm install

# Configure AWS credentials
export AWS_PROFILE=your-profile
# or
export AWS_ACCESS_KEY_ID=xxx && export AWS_SECRET_ACCESS_KEY=xxx

# (optional) Listen incoming connections on all available network interfaces
export HOST=0.0.0.0

# Compile and run application
npm run build
npm start

# Alternatively you can run in development mode for easier debugging
npm run dev

Open http://localhost:3000 and click the microphone to start talking.

Configuration

Settings are adjustable from the UI panel:

Setting	Description
Voice	Choose between Tiffany or Matthew personas
Response Timing	Controls voice activity detection sensitivity — faster means quicker responses but may cut off longer pauses
Audio Quality	Output sample rate — higher quality uses more bandwidth
Initial Buffer	Audio buffer before playback starts — lower means faster response, higher means smoother playback
Temperature	Higher values produce more varied responses
Top P	Controls response diversity via nucleus sampling
Max Tokens	Limits response length
AWS Region	Select a closer region to reduce latency

System Prompts

Pre-built personas available in public/prompts:

Friendly Chat — Casual conversational assistant
Insurance Service — Customer service agent
Intelligent Assistant — General-purpose with tool usage
Math Tutor — High school math teacher

Resources

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dist/tools		dist/tools
public		public
screenshots		screenshots
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational AI with Amazon Nova 2 Sonic

Features

Unified Speech-to-Speech Model

Natural Conversation Flow

Voice & Language

Tool Calling

Architecture

Screenshots

Quick Start

Prerequisites

Setup

Configuration

System Prompts

Resources

Security

License

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

aws-samples/sample-voicebot-nova-sonic

Folders and files

Latest commit

History

Repository files navigation

Conversational AI with Amazon Nova 2 Sonic

Features

Unified Speech-to-Speech Model

Natural Conversation Flow

Voice & Language

Tool Calling

Architecture

Screenshots

Quick Start

Prerequisites

Setup

Configuration

System Prompts

Resources

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages