-
Notifications
You must be signed in to change notification settings - Fork 501
Description
WhisperKit CoreML Audio Resource Leak - Complete Analysis
Summary
WhisperKit's CoreML backend causes persistent high CPU usage in macOS's coreaudiod daemon (~10-12%) that persists after transcription completes. This appears to be caused by CoreML audio processing resources that are never fully released.
Environment
- macOS: 26.2 (Tahoe)
- Hardware: MacBook Air (Apple Silicon M4)
- App: Hex (speech-to-text app using WhisperKit)
- WhisperKit: Latest version from main branch
- Date Discovered: January 8, 2026
Reproduction Steps
- Launch any app using WhisperKit for transcription
- Perform one audio transcription
- Wait for transcription to complete
- Observe Activity Monitor
Expected Behavior
After transcription completes and WhisperKit instance is released:
coreaudiodCPU: ~0.5-1% (baseline idle)
Actual Behavior
After transcription completes:
coreaudiodCPU: ~10-12% (persists indefinitely)- Only returns to normal when app is quit
- Killing/restarting
coreaudioddoes NOT fix it - The app itself can be idle - problem persists
Technical Analysis
Using system diagnostics (lsof, sample, filesystem comparison), I traced the exact moment the issue occurs:
Files Loaded During First Transcription
/Users/.../com.apple.e5rt.e5bundlecache/.../H16G.bundle/main/main_bnns/bnns_program.bnnsir
These are BNNS (Basic Neural Network Subroutines) files - WhisperKit's CoreML model caches.
CPU State Change
BEFORE first transcription:
_coreaudiod 0.0% CPU (idle)
AFTER transcription completes:
_coreaudiod 11.7% CPU (stays forever until app quits)
What We Tried (All Failed)
Attempted fixes in Hex app code:
- ❌ Unload WhisperKit after transcription (
whisperKit = nil) - ❌ Disable all audio recording warmup/priming
- ❌ Destroy AVAudioRecorder immediately after use
- ❌ Disable audio level metering
- ❌ Prevent AVAudioEngine from staying active
- ❌ Explicit cleanup of all audio resources
- ❌ Force GC delays
None of these helped - the problem persists even when we destroy everything in our Swift code.
Proof It's WhisperKit/CoreML
- App has ZERO audio activity in logs after transcription
- No audio files remain open
- No recorder instances active
lsofshows only CoreML.bnnsirmodel files- Problem only occurs after first transcription (when CoreML models load)
- Quitting app immediately fixes it (releases CoreML resources)
Related Issues
This appears similar to known CoreML audio processing bugs:
-
whisper.cpp #1202: "CoreML + calls to whisper_full result in increased memory usage"
- MLMultiArray allocations in
whisper_coreml_encodenever freed - CoreML + calls to whisper_full result in increased memory usage (apparent leak) ggml-org/whisper.cpp#1202
- MLMultiArray allocations in
-
whisper.cpp #797: "Increasing memory usage over time with CoreML"
-
WhisperKit Memory leak when using ModelComputeOptions
.cpuAndGPUwith Turbo model on m1 #265: "Memory leak when repeatedly destroying and re-instantiating WhisperKit"
Root Cause Hypothesis
WhisperKit uses CoreML's Audio Feature Print or similar audio processing APIs. When the model is loaded for the first time:
- CoreML initializes an audio processing graph in
coreaudiod - This graph processes audio for feature extraction during transcription
- Even after transcription completes and WhisperKit is deallocated, CoreML doesn't release the audio processing graph
- The graph stays active, polling/processing at 10Hz or similar
- This causes persistent high
coreaudiodCPU usage
Potential Solutions
Option 1: WhisperKit Framework Fix (Ideal)
WhisperKit should add explicit cleanup:
deinit {
// Release CoreML audio processing resources
// Invalidate MLModel instances
// Clear audio feature extraction caches
}Option 2: Apple CoreML Fix (Long-term)
File a Feedback with Apple about CoreML audio processing resources not being released after model deallocation.
Option 3: App-level Workaround (Pragmatic)
For apps using WhisperKit:
A) Quit and restart after transcription:
// After transcription
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
NSApplication.shared.relaunch(afterDelay: 0)
}B) Warn users about CPU usage:
"Note: First transcription may cause increased background CPU usage (~10%).
Restarting the app after use is recommended for battery life."
C) Background task to reset coreaudiod:
// Periodically check and reset if needed
if coreaudiodCPU > 8.0 {
Process.launchedProcess(
executableURL: URL(fileURLWithPath: "/usr/bin/killall"),
arguments: ["coreaudiod"]
)
}Impact
- Battery life: Significantly reduced on MacBooks (~10-15% battery drain)
- Heat: Fan activity increases
- Performance: One CPU core effectively locked at 100%
- User experience: Poor for battery-powered devices
Request
-
To WhisperKit maintainers: Can you add explicit CoreML resource cleanup in
deinitor provide acleanup()method? -
To Apple: Is this a known CoreML limitation? Should we file a FB?
-
Temporary fix: Add documentation warning users about this behavior until resolved
Testing
To reproduce in ANY WhisperKit-using app:
// Before first transcription
// Check: ps aux | grep coreaudiod → 0% CPU
let whisperKit = try await WhisperKit()
let result = try await whisperKit.transcribe(audioPath: "test.wav")
// After transcription
whisperKit = nil // Even with explicit dealloc
// Check: ps aux | grep coreaudiod → 10% CPU (stays!)Found by: Benjamin Jacobs
Date: 2026-01-08
Diagnostic files: Available upon request (before/after lsof dumps, sample traces)