Skip to content

CoreML Audio Resource Leak #393

@benjaminfrombe

Description

@benjaminfrombe

WhisperKit CoreML Audio Resource Leak - Complete Analysis

Summary

WhisperKit's CoreML backend causes persistent high CPU usage in macOS's coreaudiod daemon (~10-12%) that persists after transcription completes. This appears to be caused by CoreML audio processing resources that are never fully released.

Environment

  • macOS: 26.2 (Tahoe)
  • Hardware: MacBook Air (Apple Silicon M4)
  • App: Hex (speech-to-text app using WhisperKit)
  • WhisperKit: Latest version from main branch
  • Date Discovered: January 8, 2026

Reproduction Steps

  1. Launch any app using WhisperKit for transcription
  2. Perform one audio transcription
  3. Wait for transcription to complete
  4. Observe Activity Monitor

Expected Behavior

After transcription completes and WhisperKit instance is released:

  • coreaudiod CPU: ~0.5-1% (baseline idle)

Actual Behavior

After transcription completes:

  • coreaudiod CPU: ~10-12% (persists indefinitely)
  • Only returns to normal when app is quit
  • Killing/restarting coreaudiod does NOT fix it
  • The app itself can be idle - problem persists

Technical Analysis

Using system diagnostics (lsof, sample, filesystem comparison), I traced the exact moment the issue occurs:

Files Loaded During First Transcription

/Users/.../com.apple.e5rt.e5bundlecache/.../H16G.bundle/main/main_bnns/bnns_program.bnnsir

These are BNNS (Basic Neural Network Subroutines) files - WhisperKit's CoreML model caches.

CPU State Change

BEFORE first transcription:

_coreaudiod  0.0% CPU  (idle)

AFTER transcription completes:

_coreaudiod  11.7% CPU  (stays forever until app quits)

What We Tried (All Failed)

Attempted fixes in Hex app code:

  1. ❌ Unload WhisperKit after transcription (whisperKit = nil)
  2. ❌ Disable all audio recording warmup/priming
  3. ❌ Destroy AVAudioRecorder immediately after use
  4. ❌ Disable audio level metering
  5. ❌ Prevent AVAudioEngine from staying active
  6. ❌ Explicit cleanup of all audio resources
  7. ❌ Force GC delays

None of these helped - the problem persists even when we destroy everything in our Swift code.

Proof It's WhisperKit/CoreML

  1. App has ZERO audio activity in logs after transcription
  2. No audio files remain open
  3. No recorder instances active
  4. lsof shows only CoreML .bnnsir model files
  5. Problem only occurs after first transcription (when CoreML models load)
  6. Quitting app immediately fixes it (releases CoreML resources)

Related Issues

This appears similar to known CoreML audio processing bugs:

  1. whisper.cpp #1202: "CoreML + calls to whisper_full result in increased memory usage"

  2. whisper.cpp #797: "Increasing memory usage over time with CoreML"

  3. WhisperKit Memory leak when using ModelComputeOptions .cpuAndGPU with Turbo model on m1 #265: "Memory leak when repeatedly destroying and re-instantiating WhisperKit"

Root Cause Hypothesis

WhisperKit uses CoreML's Audio Feature Print or similar audio processing APIs. When the model is loaded for the first time:

  1. CoreML initializes an audio processing graph in coreaudiod
  2. This graph processes audio for feature extraction during transcription
  3. Even after transcription completes and WhisperKit is deallocated, CoreML doesn't release the audio processing graph
  4. The graph stays active, polling/processing at 10Hz or similar
  5. This causes persistent high coreaudiod CPU usage

Potential Solutions

Option 1: WhisperKit Framework Fix (Ideal)

WhisperKit should add explicit cleanup:

deinit {
    // Release CoreML audio processing resources
    // Invalidate MLModel instances
    // Clear audio feature extraction caches
}

Option 2: Apple CoreML Fix (Long-term)

File a Feedback with Apple about CoreML audio processing resources not being released after model deallocation.

Option 3: App-level Workaround (Pragmatic)

For apps using WhisperKit:

A) Quit and restart after transcription:

// After transcription
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
    NSApplication.shared.relaunch(afterDelay: 0)
}

B) Warn users about CPU usage:

"Note: First transcription may cause increased background CPU usage (~10%).  
Restarting the app after use is recommended for battery life."

C) Background task to reset coreaudiod:

// Periodically check and reset if needed
if coreaudiodCPU > 8.0 {
    Process.launchedProcess(
        executableURL: URL(fileURLWithPath: "/usr/bin/killall"),
        arguments: ["coreaudiod"]
    )
}

Impact

  • Battery life: Significantly reduced on MacBooks (~10-15% battery drain)
  • Heat: Fan activity increases
  • Performance: One CPU core effectively locked at 100%
  • User experience: Poor for battery-powered devices

Request

  1. To WhisperKit maintainers: Can you add explicit CoreML resource cleanup in deinit or provide a cleanup() method?

  2. To Apple: Is this a known CoreML limitation? Should we file a FB?

  3. Temporary fix: Add documentation warning users about this behavior until resolved

Testing

To reproduce in ANY WhisperKit-using app:

// Before first transcription
// Check: ps aux | grep coreaudiod  → 0% CPU

let whisperKit = try await WhisperKit()
let result = try await whisperKit.transcribe(audioPath: "test.wav")

// After transcription  
whisperKit = nil  // Even with explicit dealloc
// Check: ps aux | grep coreaudiod  → 10% CPU (stays!)

Found by: Benjamin Jacobs
Date: 2026-01-08
Diagnostic files: Available upon request (before/after lsof dumps, sample traces)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions