This repository contains an example Android application that demonstrates the easy integration of the Grok Voice Agent API using plain websocket communication. No text-to-speech or speech-to-text lib is needed, as the Grok Voice Agent API uses base64 encoded voice data directly.
The app uses custom tools configuration to let the voice model call Android functions for some use-cases:
-
Use voice command for navigation between the different app screens (NavGraph).
Navigate to the "home" screen.Navigate to the "favorites" screen.Navigate to the "settings" screen.Navigate to the "music" screen. -
Use voice command to scroll to an item inside the list (favorites screen).
Goto item 5. -
Use voice command to analyze the content of the current screen.
Analyze the screen.
usage-demo.mov
usage-demo2.mov
| Analyze Screen Command | Navigation Command |
|---|---|
![]() |
![]() |
- Add an environment variable XAI_API_KEY with your Grok API key.
- Open the project in Android Studio.
- Start Android Emulator with enabled microphone and install the app.
- Toggle "Connect" button to start the Grok Voice Agent session.
- Toggle "Speak" button and start speaking.
The accessibility service UiTreeDumpService can also be enabled using adb commands.
adb shell settings put secure enabled_accessibility_services com.example.voiceapitest/.UiTreeDumpService
adb shell settings put secure accessibility_enabled 1

