Fast, private speech-to-text for React Native and Expo. Everything runs on your phone, no servers, no waiting, no cloud bills.
- Speech-to-text that runs entirely on-device (no internet required)
- Real-time transcription from your microphone
- Transcribe audio files or raw audio buffers
- 12 models to choose from (75MB tiny model up to 3.1GB for best accuracy)
- Models auto-download from HuggingFace and cache locally
- Works on both iOS and Android
- GPU acceleration when available
- Full TypeScript support
This is a local Expo module in the whisperapp project. It's linked in package.json like this:
"expo-whisper": "file:./modules/expo-whisper"Import it in your code:
import { useWhisper, downloadModel, getModelPath } from 'expo-whisper';npm install expo-file-system react-native-quick-base64The react-native-quick-base64 package is native code that handles audio encoding/decoding way faster than the JavaScript version.
Drop this in your app.json:
{
"expo": {
"ios": {
"infoPlist": {
"NSMicrophoneUsageDescription": "Microphone access is required for speech-to-text transcription"
}
}
}
}Android permissions happen automatically, so you don't need to do anything.
Since this includes native code (expo-whisper), you need to prebuild:
npm install
npx expo prebuild --cleanThen run it on a device:
# iOS
npx expo run:ios
# Android
npx expo run:androidImportant: npx expo start (the dev server) won't work with native modules. You must use prebuild + run:ios/android to actually run the app.
Models are stored locally on your device and download from HuggingFace the first time you need them.
import { Whisper, downloadModel, getModelPath, isModelDownloaded } from 'expo-whisper';
// Check if you already have it
const exists = await isModelDownloaded('tiny');
if (!exists) {
// Download it (about 75MB for the tiny model)
await downloadModel('tiny', (progress) => {
console.log(`Download: ${Math.round(progress * 100)}%`);
});
}
// Get the local path and fire it up
const modelPath = await getModelPath('tiny');
const whisper = await Whisper.initialize({
modelPath,
useGpu: true
});Available Models:
- tiny (75MB) - fast, works on anything
- base (142MB) - good balance
- small (466MB) - pretty accurate
- medium (1.5GB) - very accurate
- large-v3 (3.1GB) - best possible accuracy
Without React (just plain JavaScript/TypeScript):
import { Whisper } from 'expo-whisper';
const whisper = await Whisper.initialize({ modelPath });
const task = await whisper.transcribeFile(path);
console.log(task.result?.text);React Hook (way easier for React apps):
import { useWhisper } from 'expo-whisper';
const { transcribeFile, progress, result } = useWhisper({ modelPath });The Whisper class is a singleton. Call initialize() twice and you get the same object back:
const w1 = await Whisper.initialize({ modelPath });
const w2 = await Whisper.initialize({ modelPath });
// w1 === w2 âś“
// If you want a fresh one, release first
await w1.release();
const w3 = await Whisper.initialize({ modelPath }); // New instanceEvery transcription method returns a TranscriptionTask that you can track:
const task = await whisper.transcribeFile(path);
// Check how it's going
console.log(task.progress); // 0-100
console.log(task.status); // 'processing', 'complete', error, etc.
// Get the actual transcription
console.log(task.result?.text); // Full text
console.log(task.result?.segments); // Individual words with timings
// Change your mind? Cancel it
await task.cancel();If you're recording from the microphone:
| Method | Best For |
|---|---|
startRecording() + stopRecording() |
When you need full control (track progress, cancel mid-recording) |
recordAndTranscribe() |
Simple case where you just want to record and get text back |
Pick whichever fits your flow.
Wrap your transcription in a try-catch:
try {
const task = await whisper.transcribeFile(path);
console.log(task.result?.text);
} catch (error) {
console.error('Something went wrong:', error);
}Or check the task status after it finishes:
const task = await whisper.transcribeFile(path);
if (task.status === 'error') {
console.error('Transcription failed:', task.error?.message);
}Tasks stick around in memory until you clean them up:
// See what's currently running
const active = whisper.getActiveTasks();
// Get some stats
const stats = whisper.getStats();
// { active: 0, total: 5, completed: 4, failed: 1 }
// Clean up tasks older than 1 hour (in milliseconds)
whisper.cleanupCompletedTasks(3600000);
// Or just clear everything that's done
whisper.clearCompletedTasks();For live transcription from the mic:
await whisper.startRealtime(300, {
language: 'en',
onSegment: (segment) => console.log(segment.text)
});
// Check state whenever you want
console.log(whisper.isRealtimeActive()); // true/false
console.log(whisper.getRealtimeText()); // what you've said so far
console.log(whisper.getRealtimeSegments()); // individual words
// Stop and get final result
const task = await whisper.stopRealtime();
console.log(task.result?.text);Note on Accuracy: Real-time transcription trades accuracy for responsiveness. It processes audio in chunks to give instant feedback, so it sees less context than batch transcription. For better accuracy, use recordAndTranscribe() or record audio first then transcribe with transcribeBuffer().
For the full, detailed API docs, check out modules/expo-whisper/README.md.
import { Whisper } from 'expo-whisper';
// Fire it up
const whisper = await Whisper.initialize({ modelPath, useGpu: true });
// Transcribe files or audio buffers
const fileTask = await whisper.transcribeFile(path, options);
const bufferTask = await whisper.transcribeBuffer(buffer, options);
// Record from microphone
await whisper.startRecording();
const recordingTask = await whisper.stopRecording();
// Live transcription
await whisper.startRealtime(300, { onSegment, onAudioLevel });
const realtimeTask = await whisper.stopRealtime();
// Manage what's running
whisper.getTask(taskId);
whisper.getActiveTasks();
whisper.getStats();
whisper.cleanupCompletedTasks();
// Clean up when you're done
await whisper.release();import { useWhisper } from 'expo-whisper';
const {
transcribeFile,
transcribeBuffer,
progress,
result,
error,
isLoading,
segments,
cancel,
reset,
} = useWhisper({ modelPath, language: 'en' });
// Now use these in your component
await transcribeFile(path);
console.log(result?.text);For live mic transcription in React components:
const {
isRecording,
startRecording,
stopRecording,
segments,
interimText,
metrics,
} = useRealtimeTranscription({ modelPath });You can tweak how transcription works:
{
language: 'en' | 'auto' | ... // What language to expect
temperature: number // How "creative" the model is (0 = exact, 1+ = random)
translate: boolean // Translate result to English (if the audio isn't in English)
maxTokens: number // Limit output length
suppressBlank: boolean // Filter out silence
suppressNst: boolean // Filter out special markers like [noise]
onProgress: (p: number) => void // Callback as it processes (0-100)
onSegment: (s: Segment) => void // Callback for each word/phrase
}Models come from HuggingFace. They download and cache locally:
import {
downloadModel,
isModelDownloaded,
getModelPath,
deleteModel,
MODEL_SIZES,
} from 'expo-whisper';
// Do you have it?
if (!await isModelDownloaded('tiny')) {
// Nope, get it
await downloadModel('tiny', (progress) => {
console.log(`Downloading: ${(progress * 100).toFixed(0)}%`);
});
}
// Where is it?
const path = await getModelPath('tiny');
// Need the space back? Delete it
await deleteModel('tiny');
// What's available?
console.log(MODEL_SIZES); // ['tiny', 'base', 'small', ...]| Model | Size | Use When... |
|---|---|---|
| tiny | 75 MB | Speed matters (mobile, low-end devices) |
| tiny.en | 75 MB | Just English, fastest possible |
| base | 142 MB | Good speed + accuracy balance |
| base.en | 142 MB | Just English, balanced |
| small | 466 MB | Want better accuracy |
| small.en | 466 MB | Just English, pretty accurate |
| medium | 1.5 GB | High accuracy (need a decent phone) |
| medium.en | 1.5 GB | Just English, high accuracy |
| large-v1 | 2.9 GB | Best accuracy (flagship phones only) |
| large-v2 | 2.9 GB | Best accuracy (flagship phones only) |
| large-v3 | 3.1 GB | Best accuracy, latest version |
| large-v3-turbo | 1.6 GB | Best accuracy but faster than large |
Models get cached in Documents/whisper-models/ on iOS and your app's documents folder on Android.
const task = await whisper.transcribeFile(path, {
language: 'en',
temperature: 0.3, // 0 = it's super literal, 1+ = it gets creative
beamSize: 5, // search more possibilities (slower but more accurate)
translate: true, // convert result to English
});VAD filters your output so you get cleaner transcriptions:
suppressBlank (default: on)
true: Removes silence - only actual speechfalse: Includes empty spots between words
suppressNst (default: on)
true: Clean, professional output (recommended for apps)false: Includes special markers like [silence] and [noise]
What should you use?
const task = await whisper.transcribeFile(path, {
suppressBlank: true, // Filter out silence
suppressNst: true, // Filter out special markers
});| Use Case | suppressBlank | suppressNst | Result |
|---|---|---|---|
| Production app | true | true | Just the words |
| Debugging | false | false | Everything, including markers |
| Transcription with context | false | true | All words, no markers |
| Aggressive filtering | true | false | Filtered silence + markers |
Want progress updates or individual words as they're processed?
const task = await whisper.transcribeFile(path, {
language: 'en',
onProgress: (progress) => console.log(`Done: ${progress}%`),
onSegment: (segment) => {
console.log(`${segment.start}s - ${segment.end}s: "${segment.text}"`);
},
});Check out App.tsx for a working example. It shows how to:
- Transcribe files
- Transcribe audio buffers
- Do real-time streaming
- Cancel transcriptions mid-process
- Handle errors properly
- Models live in:
Documents/whisper-models/ - Audio format: WAV (16-bit PCM, 16kHz, mono)
- Uses Metal GPU acceleration
- Needs permission: NSMicrophoneUsageDescription
- Models in the app's documents folder
- Audio format: WAV (platform encoding)
- Uses NNAPI GPU acceleration
- Permissions handled automatically
- Pick the right model: tiny and base are blazing fast, large models are more accurate
- Enable GPU: Set
useGpu: truewhen initializing (it's automatic when available) - Clean up: Call
whisper.release()when you're done to free memory - Models cache locally: After download, they're super fast to use
- Real-time transcription: Use smaller models (tiny/base) for responsive live transcription
Check your internet connection and make sure you have space on the device.
Check that the model file actually exists:
const path = await getModelPath('tiny');
const exists = await isModelDownloaded('tiny');- Try a smaller model (tiny or base are quick)
- Make sure GPU is enabled:
useGpu: true - Check if your device has enough RAM
- Real-time mode might feel faster for live input
"Recording doesn't work"
- Check permissions:
await requestMicrophonePermissions() - Test with the Voice Memos app to verify the mic works
- Check your native logs to see if audio is being captured
"Real-time isn't showing words"
- Check that
onSegmentcallback is actually set up - Make sure
suppressBlankandsuppressNstaren't filtering everything out - Look at WhisperContext.swift NSLog statements to see if the callback is firing
If you want to type your code properly:
// The main stuff
import type {
Whisper, // The class
TranscribeResult, // Result object
Segment, // Individual word/phrase
TranscriptionTask, // Task you get back
} from 'expo-whisper';
// Configuration
import type {
WhisperInitOptions, // Options for initialize()
TranscribeOptions, // Options for transcribe*()
WhisperModelSize, // 'tiny' | 'base' | etc
} from 'expo-whisper';
// React
import type {
UseWhisperOptions,
UseWhisperReturn,
UseRealtimeTranscriptionOptions,
UseRealtimeTranscriptionReturn,
RealtimeMetrics,
} from 'expo-whisper';MIT
- Built on whisper.cpp
- Based on OpenAI's Whisper
- Models from HuggingFace
