Skip to content

Grok Voice Agent API Android showcase for screen navigation and screen analysis.

Notifications You must be signed in to change notification settings

rmh78/grok-voice-api-android-showcase

Repository files navigation

GROK VOICE API TEST

This repository contains an example Android application that demonstrates the easy integration of the Grok Voice Agent API using plain websocket communication. No text-to-speech or speech-to-text lib is needed, as the Grok Voice Agent API uses base64 encoded voice data directly.

Use cases

The app uses custom tools configuration to let the voice model call Android functions for some use-cases:

  • Use voice command for navigation between the different app screens (NavGraph).

    Navigate to the "home" screen.
    
    Navigate to the "favorites" screen.
    
    Navigate to the "settings" screen.
    
    Navigate to the "music" screen.
    
  • Use voice command to scroll to an item inside the list (favorites screen).

    Goto item 5.
    
  • Use voice command to analyze the content of the current screen.

    Analyze the screen.
    

Demo Video

usage-demo.mov
usage-demo2.mov

App Screenshots

Analyze Screen Command Navigation Command

Installation

  1. Add an environment variable XAI_API_KEY with your Grok API key.
  2. Open the project in Android Studio.
  3. Start Android Emulator with enabled microphone and install the app.
  4. Toggle "Connect" button to start the Grok Voice Agent session.
  5. Toggle "Speak" button and start speaking.

The accessibility service UiTreeDumpService can also be enabled using adb commands.

adb shell settings put secure enabled_accessibility_services com.example.voiceapitest/.UiTreeDumpService
adb shell settings put secure accessibility_enabled 1

About

Grok Voice Agent API Android showcase for screen navigation and screen analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages