Skip to content

Unified test suite#13

Merged
akshatvasisht merged 13 commits intomainfrom
unified-test-suite
Feb 14, 2026
Merged

Unified test suite#13
akshatvasisht merged 13 commits intomainfrom
unified-test-suite

Conversation

@akshatvasisht
Copy link
Owner

@akshatvasisht akshatvasisht commented Jan 29, 2026

Summary by CodeRabbit

  • New Features

    • Added state synchronization between frontend and backend for consistent mode, emotion override, and transmission status display.
    • Enhanced audio pre-roll buffering for improved speech capture during transmission.
    • Push-to-talk button now indicates both recording and streaming states.
  • Bug Fixes

    • Improved silence detection and audio buffering handling.
    • Fixed state initialization on application startup.
  • Documentation

    • Added audio utility scripts for device enumeration and parameter verification.
    • Expanded setup documentation with logging and troubleshooting guidance.
  • Chores

    • Updated audio processing to 48kHz sample rate.
    • Enhanced test infrastructure with mock audio services and end-to-end pipeline tests.

@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request expands the Janus audio pipeline with production-ready infrastructure: upgrading audio processing to 48kHz with larger chunks, adding testable threading controls via stop_event parameters, introducing ControlStateMessage for full engine state synchronization, implementing comprehensive mock-based test suites including E2E loopback validation, and enhancing UI state management with streaming awareness alongside recording states.

Changes

Cohort / File(s) Summary
Audio Configuration
backend/services/audio_io.py, backend/services/prosody.py, backend/services/synthesizer.py, backend/services/vad.py
Updated default sample rate from 44100 Hz to 48000 Hz and chunk size from 512 to 1536 samples across audio services. Added real-time simulation delay in mock audio read path. Adjusted VAD to handle 48kHz as primary path with 44100 Hz fallback.
Message Types & Synchronization
backend/api/types.py, frontend/types/janus.ts
Added new ControlStateMessage type containing is_streaming, is_recording, mode, and emotion_override fields; extended JanusOutboundMessage union to include this new message for full engine state synchronization with frontend.
Threading & Control Flow
backend/scripts/receiver_main.py, backend/scripts/sender_main.py
Added optional stop_event parameter to both receiver and sender loops, enabling external shutdown signaling for testing. Introduced ThreadPoolExecutor-based background processing in sender for transcription/prosody work. Added pre-roll buffering (320ms) and cooperative shutdown paths while preserving KeyboardInterrupt behavior in production mode.
Engine Audio Buffering & Logging
backend/services/engine.py
Integrated pre-roll buffering (deque of max 10 chunks), adjusted silence threshold from 16 to 15 chunks, added mode-specific VAD gating for MORSE/TEXT_ONLY modes, implemented short-buffer guard (skip if <1536\*6 samples), and enhanced logging for transmission start, processing duration, and skip reasons.
Test Infrastructure & Fixtures
backend/tests/conftest.py
Introduced centralized pytest conftest with MockAudioService mimicking AudioService interface, supporting configurable input patterns and write capture. Added autouse fixtures for mock injection via monkeypatch and engine state reset, enabling hardware-independent testing.
End-to-End Integration Test
backend/tests/test_e2e_local.py
Added comprehensive E2E loopback test with MockTranscriber, MockSynthesizer, MockVAD, and MockProsodyExtractor; includes utility functions for synthetic speech/silence/conversation patterns; validates multi-turn sender→receiver pipeline without physical hardware or network.
Test Suite Updates
backend/tests/test_engine.py, backend/tests/test_input_processing.py, backend/tests/test_api_flow.py, backend/tests/test_synthesis.py
Updated test sample rates and chunk sizes (44100→48000 Hz, 512→1536), removed mock_pyaudio fixture in favor of global mock_audio_service, added audio ducking test coverage with state isolation fixture, replaced string literals with EmotionOverride enum, removed unused import (ReferenceAudio).
WebSocket State Management
backend/api/socket_manager.py
Integrated ControlStateMessage emission on WebSocket connection start, added _reset_control_state() helper to reset streaming/recording/talking flags on disconnect, enabling UI state synchronization.
Frontend Hooks & State
frontend/hooks/useJanusWebSocket.ts, frontend/hooks/useJanusSocket.ts
Extended useJanusWebSocket with control_state message routing, localStorage persistence for transcripts (100-item limit) and packet history (200-item limit), improved error handling with lastError exposure. Added useJanusSocket integration of useQuery for remoteState syncing of mode, emotion_override, is_recording, is_streaming from cache.
UI Components
frontend/components/PushToTalk.tsx, frontend/components/ConversationPanel.tsx, frontend/components/NetworkLog.tsx
Updated button active styling and labels to reflect both isRecording and isStreaming states (previously recording-only). Simplified mouse event handlers to direct debounced calls. Updated empty-state and placeholder messages ("No voice data received yet. Press PTT to speak or S to Stream", "No packets in history. Transmissions appear here").
Configuration & Utilities
.gitignore, backend/scripts/list_audio_devices.py, backend/scripts/verify_audio_params.py
Updated .gitignore to use **/*.wav recursive pattern and added .hf/, agentcontext/ directories. Added device enumeration and audio parameter verification scripts for PyAudio and Qwen3-TTS model config inspection.
Logging & Server Setup
backend/server.py
Introduced RotatingFileHandler-based logging to logs/janus.log (5 MB max, 5 backups) with INFO level, dual stream+file output, and module-level logger configuration.
Documentation
docs/ARCHITECTURE.md, docs/SETUP.md, docs/STYLE.md, docs/TESTING.md, README.md, backend/scripts/__init__.py, backend/tests/__init__.py
Added comprehensive ARCHITECTURE.md documenting test suites and frontend telemetry structure, expanded SETUP.md with logging configuration and audio utility usage, updated references from projectdocs.md to ARCHITECTURE.md/SETUP.md, added E2E test documentation, updated badge references (Whisper→STT-Faster--Whisper, added TTS/TypeScript/Next.js/Tailwind), added module docstrings to backend/scripts and backend/tests packages.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client (Web)
    participant WS as WebSocket Server
    participant Engine as Engine
    participant Sender as Sender Process
    participant Executor as ThreadPoolExecutor

    Client->>WS: Connect
    WS->>Engine: Read initial state
    WS-->>Client: ControlStateMessage (mode, is_streaming, is_recording, emotion_override)
    
    Client->>WS: Start Stream
    WS->>Engine: Update state (is_streaming=true)
    WS-->>Client: ControlStateMessage (updated state)
    
    Sender->>Sender: Read audio chunk
    Sender->>Sender: Check pre-roll buffer & VAD
    Sender->>Sender: Detect speech
    Sender->>Executor: Submit process_and_transmit (async)
    Executor->>Executor: Transcribe audio
    Executor->>Executor: Extract prosody
    Executor->>Executor: Transmit packet
    Executor-->>Sender: Job complete
    
    Sender->>Engine: Detect silence
    Engine->>Engine: Check silence threshold
    Engine->>Sender: Trigger processing if buffered
    
    Client->>WS: Stop Stream
    WS->>Engine: Update state (is_streaming=false)
    WS-->>Client: ControlStateMessage (updated state)
Loading
sequenceDiagram
    participant Test as Test Runner
    participant MockAudio as MockAudioService
    participant Receiver as Receiver Loop
    participant Sender as Sender Loop
    participant Executor as ThreadPoolExecutor

    Test->>MockAudio: Set input pattern (conversation)
    Test->>Receiver: Start in thread with stop_event
    Test->>Sender: Start in thread with stop_event
    
    Sender->>MockAudio: read_chunk()
    MockAudio-->>Sender: Audio frame
    Sender->>Executor: process_and_transmit()
    Executor->>Executor: Transcribe + analyze
    Executor->>Receiver: UDP packet
    
    Receiver->>MockAudio: write_chunk() (synthesized)
    
    Sender->>Sender: Detect silence threshold
    Sender->>Executor: Submit next batch
    
    Test->>Test: Wait for multiple outputs
    Test->>Sender: Set stop_event
    Test->>Receiver: Set stop_event
    
    Sender->>Executor: shutdown()
    Receiver-->>Test: Cleanup complete
    
    Test->>Test: Assert ≥2 WAV chunks, no deadlock
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • Packet log #12: Introduces audio ducking logic (apply_ducking_if_needed function) and ducking state management with test coverage in backend/services/engine.py and backend/tests/test_engine.py, overlapping with this PR's engine and test infrastructure changes.

Poem

🐰 Hop along, the pipeline flows anew!
48kHz threads that run on cue,
With mocks and buffers, state synced bright,
E2E tests running through the night! 🎵
No hardware needed, just the flow—
Watch those packets dance and glow!

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Unified test suite' is too vague and generic; it does not convey the specific nature of this extensive multi-system changeset covering backend refactoring, audio configuration updates, new message types, test infrastructure, and frontend improvements. Consider a more descriptive title that captures the main focus, such as 'Add comprehensive test infrastructure and unified 48kHz audio configuration' or similar to clarify the primary objectives.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 94.52% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch unified-test-suite

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/tests/test_e2e_local.py`:
- Around line 366-378: The test currently prints thread_exceptions but does not
fail when they exist; update the post-thread check to fail the test if
thread_exceptions is non-empty by asserting or calling pytest.fail with a clear
message that includes the thread_exceptions contents (e.g., reference
thread_exceptions and include thread_name/exc), then continue to check
mock_audio_service.written_audio_chunks as before to ensure multiple chunks; use
the existing thread_exceptions variable and
mock_audio_service.written_audio_chunks to locate where to add the failing
assertion.
🧹 Nitpick comments (4)
backend/tests/test_engine.py (1)

88-101: Consider moving fixture definition before the test that uses it.

While pytest resolves fixtures by name regardless of position, placing fixtures before their dependent tests improves readability and follows conventional test file organization.

♻️ Suggested reordering

Move the reset_ducking_state fixture (lines 88-101) to appear before test_audio_ducking (line 53), typically after the existing tests and before the new ducking test, or at the top of the file after imports.

backend/tests/test_e2e_local.py (3)

17-20: Remove unused imports.

These classes are imported but never used directly. The monkeypatch.setattr calls reference them via string paths (e.g., 'backend.services.transcriber.Transcriber'), not the imported symbols.

Proposed fix
-from backend.common.protocol import JanusMode, JanusPacket
+from backend.common.protocol import JanusPacket
-from backend.services.prosody import ProsodyExtractor
-from backend.services.synthesizer import Synthesizer
-from backend.services.transcriber import Transcriber
-from backend.services.vad import VoiceActivityDetector

244-244: Rename unused loop variable to _.

The loop variable turn is not used within the loop body.

Proposed fix
-    for turn in range(num_turns):
+    for _ in range(num_turns):

334-339: Consider using a synchronization primitive instead of fixed sleep.

The time.sleep(0.5) for waiting on the receiver to be ready is a race condition that may cause flaky tests on slower CI environments. A more robust approach would be using a threading.Event signaled by the receiver when it's ready to accept connections.

This is a common pattern in integration tests and may be acceptable given the generous timeouts elsewhere, but worth considering if you observe test flakiness.

Comment on lines +366 to +378
# Check for thread exceptions
if thread_exceptions:
for thread_name, exc in thread_exceptions:
print(f"Exception in {thread_name} thread: {exc}")
# Don't fail immediately - check if we got results anyway

# Verification: Assert multiple audio chunks were received
written_chunks = mock_audio_service.written_audio_chunks

# Critical assertion: Must receive at least 2 chunks (proving multiple turns work)
assert len(written_chunks) >= 2, (
f"Expected at least 2 audio chunks (multiple turns), but got {len(written_chunks)}. "
f"This indicates a deadlock bug after the first transmission."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Thread exceptions should cause test failure.

Currently, if an exception occurs in a thread but some chunks were still received, the test passes despite the error. The exceptions are only printed but not asserted against, which could mask real issues.

Proposed fix
     # Check for thread exceptions
     if thread_exceptions:
         for thread_name, exc in thread_exceptions:
             print(f"Exception in {thread_name} thread: {exc}")
-        # Don't fail immediately - check if we got results anyway
+        pytest.fail(f"Thread exceptions occurred: {thread_exceptions}")
 
     # Verification: Assert multiple audio chunks were received

Alternatively, if you want to allow partial success with warnings, consider using pytest.warns or explicitly documenting why swallowing exceptions is acceptable.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check for thread exceptions
if thread_exceptions:
for thread_name, exc in thread_exceptions:
print(f"Exception in {thread_name} thread: {exc}")
# Don't fail immediately - check if we got results anyway
# Verification: Assert multiple audio chunks were received
written_chunks = mock_audio_service.written_audio_chunks
# Critical assertion: Must receive at least 2 chunks (proving multiple turns work)
assert len(written_chunks) >= 2, (
f"Expected at least 2 audio chunks (multiple turns), but got {len(written_chunks)}. "
f"This indicates a deadlock bug after the first transmission."
# Check for thread exceptions
if thread_exceptions:
for thread_name, exc in thread_exceptions:
print(f"Exception in {thread_name} thread: {exc}")
pytest.fail(f"Thread exceptions occurred: {thread_exceptions}")
# Verification: Assert multiple audio chunks were received
written_chunks = mock_audio_service.written_audio_chunks
# Critical assertion: Must receive at least 2 chunks (proving multiple turns work)
assert len(written_chunks) >= 2, (
f"Expected at least 2 audio chunks (multiple turns), but got {len(written_chunks)}. "
f"This indicates a deadlock bug after the first transmission."
🤖 Prompt for AI Agents
In `@backend/tests/test_e2e_local.py` around lines 366 - 378, The test currently
prints thread_exceptions but does not fail when they exist; update the
post-thread check to fail the test if thread_exceptions is non-empty by
asserting or calling pytest.fail with a clear message that includes the
thread_exceptions contents (e.g., reference thread_exceptions and include
thread_name/exc), then continue to check mock_audio_service.written_audio_chunks
as before to ensure multiple chunks; use the existing thread_exceptions variable
and mock_audio_service.written_audio_chunks to locate where to add the failing
assertion.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 10-18: The README badges for Next.js and Tailwind are out of date;
locate the badge lines containing the alt texts "![Next.js]" and "![Tailwind]"
and update their version labels to match package.json (Next.js => 16.1.6,
Tailwind CSS => 4). Ensure the shield URLs/labels reflect these exact versions
so the displayed badges match the project's current dependencies.

Comment on lines 10 to +18
![Python](https://img.shields.io/badge/Python-3.10%2B-blue?logo=python&logoColor=white)
![FastAPI](https://img.shields.io/badge/FastAPI-0.109-009688?logo=fastapi&logoColor=white)
![PyTorch](https://img.shields.io/badge/PyTorch-2.0-EE4C2C?logo=pytorch&logoColor=white)
![Whisper](https://img.shields.io/badge/AI-Faster--Whisper-violet)
![Whisper](https://img.shields.io/badge/STT-Faster--Whisper-violet)
![TTS](https://img.shields.io/badge/TTS-Fish%20Audio-8B5CF6)
![MsgPack](https://img.shields.io/badge/Protocol-MessagePack-informational)
![TypeScript](https://img.shields.io/badge/TypeScript-5.0-3178C6?logo=typescript&logoColor=white)
![Next.js](https://img.shields.io/badge/Next.js-15.0-black?logo=next.js&logoColor=white)
![Tailwind](https://img.shields.io/badge/Tailwind_CSS-3.4-38B2AC?logo=tailwind-css&logoColor=white)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

git ls-files | head -20

Repository: akshatvasisht/janus

Length of output: 555


🏁 Script executed:

find . -maxdepth 2 -type f \( -name "package.json" -o -name "pyproject.toml" -o -name "requirements.txt" -o -name "poetry.lock" -o -name "package-lock.json" \) 2>/dev/null

Repository: akshatvasisht/janus

Length of output: 142


🏁 Script executed:

cat backend/requirements.txt

Repository: akshatvasisht/janus

Length of output: 707


🏁 Script executed:

cat frontend/package.json

Repository: akshatvasisht/janus

Length of output: 1021


Update Next.js and Tailwind CSS badges to reflect current dependency versions.

The Next.js badge shows version 15.0, but the project uses 16.1.6 (package.json). The Tailwind CSS badge shows 3.4, but the project uses version 4 (package.json devDependencies). Update both badges to match the current dependencies.

🧰 Tools
🪛 LanguageTool

[style] ~10-~10: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 4625 characters long)
Context: ...10%2B-blue?logo=python&logoColor=white) FastAPI PyTorch Whisper TTS MsgPack TypeScript Next.js Tailwind ![License](https://img.shields.io/badge/L...

(EN_EXCESSIVE_EXCLAMATION)

🤖 Prompt for AI Agents
In `@README.md` around lines 10 - 18, The README badges for Next.js and Tailwind
are out of date; locate the badge lines containing the alt texts "![Next.js]"
and "![Tailwind]" and update their version labels to match package.json (Next.js
=> 16.1.6, Tailwind CSS => 4). Ensure the shield URLs/labels reflect these exact
versions so the displayed badges match the project's current dependencies.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
backend/services/synthesizer.py (1)

29-41: ⚠️ Potential issue | 🟡 Minor

Stale docstring still describes removed default-path fallback.

Lines 33–35 say "If no reference audio path is provided, uses a default path in the backend directory", but the code (lines 50–53) and the updated param doc (line 40) confirm that no default is loaded when the path is None. Please align the top paragraph with the new behavior.

📝 Proposed fix
         Sets up the Fish Audio SDK client with the provided API key and loads
-        reference audio for voice cloning. If no reference audio path is provided,
-        uses a default path in the backend directory. Supports hot-reload of
-        reference audio files without restarting the service.
+        reference audio for voice cloning. If no reference audio path is provided,
+        no reference audio is loaded. Supports hot-reload of reference audio
+        files without restarting the service.
backend/services/prosody.py (1)

11-21: ⚠️ Potential issue | 🟡 Minor

Stale docstring: default sample rate documented as 44100 Hz but actual default is 48000 Hz.

Line 20 still reads Default is 44100 Hz while the constructor default on Line 11 is now 48000.

📝 Proposed fix
         Args:
-            sample_rate: Audio sample rate in Hz. Default is 44100 Hz.
+            sample_rate: Audio sample rate in Hz. Default is 48000 Hz.
             hop_size: Analysis hop size in samples. Default is 512 samples.
backend/services/vad.py (1)

11-24: ⚠️ Potential issue | 🟡 Minor

Stale docstring: default sample rate documented as 44100 Hz but actual default is 48000 Hz.

Same issue as in prosody.py — Line 23 still says "Default is 44100 Hz."

📝 Proposed fix
-            sample_rate: Input audio sample rate in Hz. Default is 44100 Hz.
+            sample_rate: Input audio sample rate in Hz. Default is 48000 Hz.
backend/services/audio_io.py (1)

24-25: ⚠️ Potential issue | 🟡 Minor

Stale docstring: still references "44100Hz sample rate, 512 sample chunk size."

Update to reflect the new values (48000 Hz, 1536 samples).

📝 Proposed fix
-        Sets up PyAudio instance and opens input/output streams for microphone capture
-        and speaker playback. Configures audio parameters (44100Hz sample rate, 512 sample
-        chunk size, mono channel) required for AI processing pipelines.
+        Sets up PyAudio instance and opens input/output streams for microphone capture
+        and speaker playback. Configures audio parameters (48000Hz sample rate, 1536 sample
+        chunk size, mono channel) required for AI processing pipelines.
backend/tests/test_input_processing.py (1)

53-67: ⚠️ Potential issue | 🟡 Minor

Stale docstring: says "default 512" but actual default is 1536.

Line 58 still reads (default 512) while the function signature on Line 53 defaults to chunk_size=1536.

📝 Proposed fix
     """
-    Generate a valid audio chunk matching CHUNK_SIZE (1536 samples).
+    Generate a valid audio chunk matching CHUNK_SIZE.
     
     Args:
-        chunk_size: Number of samples (default 512)
+        chunk_size: Number of samples (default 1536)
         sample_rate: Sample rate in Hz
🤖 Fix all issues with AI agents
In `@backend/scripts/verify_audio_params.py`:
- Around line 9-10: DEFAULT_OUTPUT_SR is set to 44100 but
AudioService.SAMPLE_RATE was changed to 48000, so update the verification script
to use the actual service sample rate: replace the hardcoded DEFAULT_OUTPUT_SR
(44_100) with 48_000 or, better, import/read AudioService.SAMPLE_RATE and set
DEFAULT_OUTPUT_SR = AudioService.SAMPLE_RATE so generate()/resampling checks
align with AudioService; ensure references to DEFAULT_OUTPUT_SR in
verify_audio_params.py use the updated value.

In `@backend/services/engine.py`:
- Around line 235-251: The pitch comparisons use 'Low' but
ProsodyExtractor.analyze_buffer() returns 'Deep' (not 'Low'), so replace the
checks for pitch == 'Low' in the prosody-based branch (the block that sets
emotion_tag when packet.override_emotion == "Auto") with pitch == 'Deep'
(including both the branch that maps to 'Panicked' and the branch that maps to
'Serious'); also consider normalizing/casing prosody.get('pitch') beforehand in
the same block to avoid future mismatches.

In `@backend/services/vad.py`:
- Around line 52-60: The code decimates 48000 and 44100 inputs by slicing
(audio_chunk[::3]) but for 44100 this yields ~14700 Hz while still setting
vad_sample_rate=16000, and the else branch passes unsupported rates to Silero
VAD (which only supports 8000/16000). Fix by validating self.sample_rate in the
VAD handling function/class: for 48000 resample/decimate to 16000 and set
vad_sample_rate=16000; for 44100 either perform proper resampling to 16000
(e.g., use librosa.resample or scipy.signal.resample_poly) or raise ValueError
if 44100 support is not desired; for any other sample rates either resample to
8000 or 16000 as appropriate or raise an explicit ValueError instead of passing
arbitrary rates. Ensure that the actual audio_chunk sampling rate matches
vad_sample_rate and update/log any errors accordingly (refer to
variables/methods using self.sample_rate, audio_chunk, and vad_sample_rate).

In `@backend/tests/test_input_processing.py`:
- Line 677: The inline comment on the for loop that iterates "for _ in
range(20)" incorrectly states "SILENCE_THRESHOLD_CHUNKS (6)"; update that
comment to reflect the actual threshold (15) or remove the parenthetical to
avoid misleading readers—locate the test loop in test_input_processing.py (the
for _ in range(20) block) and change the comment to e.g. "More than
SILENCE_THRESHOLD_CHUNKS (15)" referencing the SILENCE_THRESHOLD_CHUNKS
constant.
🧹 Nitpick comments (5)
backend/scripts/list_audio_devices.py (2)

7-7: Unused import: sys is imported but never referenced.

🧹 Proposed fix
 import pyaudio
-import sys

9-20: p.terminate() is not called if an exception occurs during device enumeration.

If get_default_input_device_info(), get_default_output_device_info(), or the device iteration loop raises (e.g., IOError when no default device exists), the PyAudio instance is never terminated. Consider using a try/finally block.

🛡️ Proposed fix
 def list_devices():
     p = pyaudio.PyAudio()
-    print(f"PyAudio Version: {pyaudio.__version__}")
-    print(f"Default Input Device: {p.get_default_input_device_info() if p.get_device_count() > 0 else 'None'}")
-    print(f"Default Output Device: {p.get_default_output_device_info() if p.get_device_count() > 0 else 'None'}")
-    
-    print("\nAll Devices:")
-    for i in range(p.get_device_count()):
-        info = p.get_device_info_by_index(i)
-        print(f"Index {i}: {info['name']} (Inputs: {info['maxInputChannels']}, Outputs: {info['maxOutputChannels']})")
-    
-    p.terminate()
+    try:
+        print(f"PyAudio Version: {pyaudio.__version__}")
+        print(f"Default Input Device: {p.get_default_input_device_info() if p.get_device_count() > 0 else 'None'}")
+        print(f"Default Output Device: {p.get_default_output_device_info() if p.get_device_count() > 0 else 'None'}")
+        
+        print("\nAll Devices:")
+        for i in range(p.get_device_count()):
+            info = p.get_device_info_by_index(i)
+            print(f"Index {i}: {info['name']} (Inputs: {info['maxInputChannels']}, Outputs: {info['maxOutputChannels']})")
+    finally:
+        p.terminate()
backend/scripts/verify_audio_params.py (1)

83-86: Directly resetting singleton internals (_instance, _initialized) is fragile.

This couples the script to ModelManager's private implementation. If the singleton pattern is refactored, this will break silently. Consider adding a public reset() class method to ModelManager if this pattern is needed.

backend/services/engine.py (1)

496-531: audio_buffer and silence_counter are reset twice — at Lines 498-499 and again at Lines 530-531.

The second reset at Lines 530-531 is redundant since audio_buffer was already cleared to [] and silence_counter set to 0 before the executor call.

🧹 Proposed fix — remove redundant reset
                 text, meta = await loop.run_in_executor(
                     executor, process_audio_blocking, combined_audio
                 )

-                audio_buffer = []
-                silence_counter = 0
-
                 if text.strip():
backend/scripts/sender_main.py (1)

148-151: Empty if block — commented-out heartbeat log leaves a no-op conditional.

The if chunk_counter % 100 == 0: block body is entirely commented out. Either add pass explicitly, remove the block, or uncomment the log at a DEBUG level.

🧹 Option A: Remove the block
             chunk_counter += 1
-            if chunk_counter % 100 == 0:
-                # logger.info(f"Audio consumer heartbeat: Processed {chunk_counter} chunks (Mock mode: {'Yes' if not audio_service._pyaudio_available else 'No'})")
-            
             trigger_processing = False
🧹 Option B: Uncomment at DEBUG level
             chunk_counter += 1
             if chunk_counter % 100 == 0:
-                # logger.info(f"Audio consumer heartbeat: Processed {chunk_counter} chunks (Mock mode: {'Yes' if not audio_service._pyaudio_available else 'No'})")
+                logger.debug(f"Audio consumer heartbeat: {chunk_counter} chunks processed")
             
             trigger_processing = False

Comment on lines +9 to +10
generate() in model_manager.py resamples output with librosa.resample when
native_sr != target_sr so that playback matches AudioService (44.1 kHz).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

DEFAULT_OUTPUT_SR (44100) is inconsistent with AudioService.SAMPLE_RATE (48000).

Line 10 says "playback matches AudioService (44.1 kHz)" and Line 21 sets DEFAULT_OUTPUT_SR = 44_100, but AudioService.SAMPLE_RATE was updated to 48000 in this same PR. Users running this script with defaults will get misleading verification results.

📝 Proposed fix
-generate() in model_manager.py resamples output with librosa.resample when
-native_sr != target_sr so that playback matches AudioService (44.1 kHz).
+generate() in model_manager.py resamples output with librosa.resample when
+native_sr != target_sr so that playback matches AudioService (48 kHz).
 """
 
 from __future__ import annotations
 
 import argparse
 import json
 import sys
 
 # Default model ID; must match ModelManager default.
 DEFAULT_MODEL_ID = "Qwen/Qwen3-TTS-12Hz-0.6B-Base"
-DEFAULT_OUTPUT_SR = 44_100
+DEFAULT_OUTPUT_SR = 48_000

Also applies to: 21-21

🤖 Prompt for AI Agents
In `@backend/scripts/verify_audio_params.py` around lines 9 - 10,
DEFAULT_OUTPUT_SR is set to 44100 but AudioService.SAMPLE_RATE was changed to
48000, so update the verification script to use the actual service sample rate:
replace the hardcoded DEFAULT_OUTPUT_SR (44_100) with 48_000 or, better,
import/read AudioService.SAMPLE_RATE and set DEFAULT_OUTPUT_SR =
AudioService.SAMPLE_RATE so generate()/resampling checks align with
AudioService; ensure references to DEFAULT_OUTPUT_SR in verify_audio_params.py
use the updated value.

Comment on lines 235 to 251
if packet.override_emotion != "Auto":
emotion_tag = packet.override_emotion
else:
prosody = packet.prosody
pitch = prosody.get('pitch', 'Normal')
energy = prosody.get('energy', 'Normal')

if pitch == 'High' and energy == 'Loud':
emotion_tag = 'Excited'
elif pitch == 'High' and energy == 'Normal':
emotion_tag = 'Joyful'
elif pitch == 'Low' and energy == 'Loud':
emotion_tag = 'Panicked'
elif pitch == 'Low' and energy in ('Quiet', 'Low'):
emotion_tag = 'Serious'
else:
emotion_tag = 'Neutral'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: pitch == 'Low' will never match — ProsodyExtractor returns 'Deep', not 'Low'.

ProsodyExtractor.analyze_buffer() returns 'Deep', 'Normal', or 'High' for the pitch key. Lines 247-249 check for 'Low', which will never match, so the 'Panicked' and 'Serious' emotion tags are unreachable through prosody-based derivation.

🐛 Proposed fix
                     if pitch == 'High' and energy == 'Loud':
                         emotion_tag = 'Excited'
                     elif pitch == 'High' and energy == 'Normal':
                         emotion_tag = 'Joyful'
-                    elif pitch == 'Low' and energy == 'Loud':
+                    elif pitch == 'Deep' and energy == 'Loud':
                         emotion_tag = 'Panicked'
-                    elif pitch == 'Low' and energy in ('Quiet', 'Low'):
+                    elif pitch == 'Deep' and energy in ('Quiet',):
                         emotion_tag = 'Serious'
                     else:
                         emotion_tag = 'Neutral'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if packet.override_emotion != "Auto":
emotion_tag = packet.override_emotion
else:
prosody = packet.prosody
pitch = prosody.get('pitch', 'Normal')
energy = prosody.get('energy', 'Normal')
if pitch == 'High' and energy == 'Loud':
emotion_tag = 'Excited'
elif pitch == 'High' and energy == 'Normal':
emotion_tag = 'Joyful'
elif pitch == 'Low' and energy == 'Loud':
emotion_tag = 'Panicked'
elif pitch == 'Low' and energy in ('Quiet', 'Low'):
emotion_tag = 'Serious'
else:
emotion_tag = 'Neutral'
if packet.override_emotion != "Auto":
emotion_tag = packet.override_emotion
else:
prosody = packet.prosody
pitch = prosody.get('pitch', 'Normal')
energy = prosody.get('energy', 'Normal')
if pitch == 'High' and energy == 'Loud':
emotion_tag = 'Excited'
elif pitch == 'High' and energy == 'Normal':
emotion_tag = 'Joyful'
elif pitch == 'Deep' and energy == 'Loud':
emotion_tag = 'Panicked'
elif pitch == 'Deep' and energy in ('Quiet',):
emotion_tag = 'Serious'
else:
emotion_tag = 'Neutral'
🤖 Prompt for AI Agents
In `@backend/services/engine.py` around lines 235 - 251, The pitch comparisons use
'Low' but ProsodyExtractor.analyze_buffer() returns 'Deep' (not 'Low'), so
replace the checks for pitch == 'Low' in the prosody-based branch (the block
that sets emotion_tag when packet.override_emotion == "Auto") with pitch ==
'Deep' (including both the branch that maps to 'Panicked' and the branch that
maps to 'Serious'); also consider normalizing/casing prosody.get('pitch')
beforehand in the same block to avoid future mismatches.

Comment on lines +52 to 60
if self.sample_rate == 48000:
audio_chunk = audio_chunk[::3]
vad_sample_rate = 16000 # Silero VAD expects 16k
elif self.sample_rate == 44100:
# Fallback for old rate if used, though it's still slightly inaccurate without proper resampling
audio_chunk = audio_chunk[::3]
vad_sample_rate = 16000
else:
vad_sample_rate = self.sample_rate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

44100 Hz fallback decimates by 3 (→14700 Hz) but tells Silero VAD it's 16000 Hz.

The comment on Line 56 acknowledges the inaccuracy, but the mismatch (14700 actual vs. 16000 declared) can meaningfully degrade speech detection. If 44100 Hz support is needed, consider using proper resampling (e.g., librosa.resample or scipy.signal.resample_poly). If it's not needed, removing the branch and raising a ValueError for unsupported rates would be clearer.

Also, the else branch (Line 59-60) passes arbitrary sample rates directly to Silero VAD, which only supports 8000 and 16000 Hz — any other rate will silently produce incorrect results or error.

🤖 Prompt for AI Agents
In `@backend/services/vad.py` around lines 52 - 60, The code decimates 48000 and
44100 inputs by slicing (audio_chunk[::3]) but for 44100 this yields ~14700 Hz
while still setting vad_sample_rate=16000, and the else branch passes
unsupported rates to Silero VAD (which only supports 8000/16000). Fix by
validating self.sample_rate in the VAD handling function/class: for 48000
resample/decimate to 16000 and set vad_sample_rate=16000; for 44100 either
perform proper resampling to 16000 (e.g., use librosa.resample or
scipy.signal.resample_poly) or raise ValueError if 44100 support is not desired;
for any other sample rates either resample to 8000 or 16000 as appropriate or
raise an explicit ValueError instead of passing arbitrary rates. Ensure that the
actual audio_chunk sampling rate matches vad_sample_rate and update/log any
errors accordingly (refer to variables/methods using self.sample_rate,
audio_chunk, and vad_sample_rate).

mock_vad.is_speech.return_value = False

for _ in range(20): # More than SILENCE_THRESHOLD_CHUNKS (16)
for _ in range(20): # More than SILENCE_THRESHOLD_CHUNKS (6)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale comment: says "SILENCE_THRESHOLD_CHUNKS (6)" but the actual threshold is 15.

The test behavior is correct (20 > 15), but the parenthetical is misleading.

📝 Proposed fix
-        for _ in range(20):  # More than SILENCE_THRESHOLD_CHUNKS (6)
+        for _ in range(20):  # More than SILENCE_THRESHOLD_CHUNKS (15)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _ in range(20): # More than SILENCE_THRESHOLD_CHUNKS (6)
for _ in range(20): # More than SILENCE_THRESHOLD_CHUNKS (15)
🤖 Prompt for AI Agents
In `@backend/tests/test_input_processing.py` at line 677, The inline comment on
the for loop that iterates "for _ in range(20)" incorrectly states
"SILENCE_THRESHOLD_CHUNKS (6)"; update that comment to reflect the actual
threshold (15) or remove the parenthetical to avoid misleading readers—locate
the test loop in test_input_processing.py (the for _ in range(20) block) and
change the comment to e.g. "More than SILENCE_THRESHOLD_CHUNKS (15)" referencing
the SILENCE_THRESHOLD_CHUNKS constant.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/services/engine.py (1)

554-565: ⚠️ Potential issue | 🔴 Critical

avg_pitch_hz and avg_energy are always None, but the proposed fix won't work because meta doesn't contain those keys.

The issue is real: lines 554–555 set both to None without extracting values from the meta dict. However, analyze_buffer() only returns categorical tags ('energy' and 'pitch' strings), not numeric values. The method computes avg_pitch internally (line 90 of prosody.py) but discards it instead of returning the numeric Hz value. The receiver path (lines 232–233) has the same problem—it tries to extract 'avg_pitch_hz' and 'avg_energy' keys that don't exist in packet.prosody.

To fix this properly, analyze_buffer() should be modified to return the numeric average pitch and energy values alongside or instead of the categorical tags. The caller can then extract and use these values.

🤖 Fix all issues with AI agents
In `@frontend/components/PushToTalk.tsx`:
- Around line 65-68: The aria-pressed attribute must reflect the same combined
active state used for styling/labeling: change any aria-pressed={isRecording} to
aria-pressed={isRecording || isStreaming} in the PushToTalk component (update
the primary button element and the secondary occurrence around the code block
that handles the alternate button/label), ensuring both the visual state and
accessibility state use the same boolean expression (isRecording ||
isStreaming).
🧹 Nitpick comments (6)
backend/services/engine.py (1)

499-506: Double reset of audio_buffer and silence_counter; magic threshold.

  1. audio_buffer and silence_counter are reset at lines 501–502 and again at lines 533–534. Since this is single-threaded async, no new chunks can arrive during the await on line 529 within the same iteration, so the second reset is dead code.

  2. The threshold 1536 * 6 is a magic number. Consider extracting it as a named constant with a comment explaining its derivation (e.g., minimum meaningful utterance length at 48 kHz).

Proposed cleanup
+    MIN_PROCESS_SAMPLES = 1536 * 6  # ~192ms at 48kHz – minimum for meaningful transcription
+
     ...
-                if len(combined_audio) < 1536 * 6:
+                if len(combined_audio) < MIN_PROCESS_SAMPLES:
                     logger.info(f"Skipping short audio buffer ({len(combined_audio)} samples)")
                     continue

And remove the redundant reset at lines 533–534.

backend/api/socket_manager.py (1)

75-84: Consider using logger instead of print for consistency.

The new _reset_control_state() uses print() (line 83), while the rest of the backend uses the logging module. This is also true for other print() calls in this file (lines 65, 68, 109, 136, 173), but since those are pre-existing, the new code should at least follow the project-wide logging pattern.

Proposed fix
+import logging
+
+logger = logging.getLogger(__name__)
+
 def _reset_control_state() -> None:
     """
     Reset engine_state.control_state to idle values.
     """
     state = engine_state.control_state
     state.is_streaming = False
     state.is_recording = False
     state.is_talking = False
-    print(f"Control State Reset on Disconnect: {state}")
+    logger.info(f"Control State Reset on Disconnect: {state}")
frontend/hooks/useJanusSocket.ts (1)

49-62: Clever use of React Query as a reactive state bus.

The enabled: false + queryFn: () => null pattern works for reading externally-set query data, but it's non-obvious. A brief comment explaining that the actual data is injected via queryClient.setQueryData from useJanusWebSocket would help future readers understand this isn't a "real" query.

Suggested comment
+  // Reactive reader for control_state messages set via queryClient.setQueryData
+  // in useJanusWebSocket — not a real fetch query.
   const { data: remoteState } = useQuery<ControlStateMessage | null>({
     queryKey: queryKeys.controlState,
     queryFn: async () => null,
     enabled: false,
   });
frontend/hooks/useJanusWebSocket.ts (3)

189-196: In-memory transcripts grow unbounded while persistence is capped at 100.

next (the full list) is returned to the query cache, but only next.slice(0, 100) is persisted. Over a long session the in-memory list can grow without limit, and on refresh the user loses everything beyond 100. If the cap is intentional, consider applying it to the returned value too so memory stays bounded.

Additionally, calling localStorage.setItem inside the setQueryData updater introduces a side effect in what React Query expects to be a pure function. Consider moving persistence to a separate useEffect that watches transcripts.


206-213: localStorage write on every packet may be a hot-path concern.

Packet summaries can arrive at high frequency. Writing up to 200 items to localStorage on every single packet is expensive (serialization + synchronous I/O). Consider debouncing or throttling the persist, or persisting on an interval / on beforeunload instead.


99-118: Persisted data loaded in useEffect will flash empty initial data for one render.

The useQuery hooks set initialData: [] synchronously, then this effect overwrites from localStorage after the first render. This causes a brief flash of empty state. A minor UX nit — if you want to avoid it, you could initialize query data from localStorage synchronously before the first render (e.g., via initialData in the useQuery call itself).

Comment on lines +65 to +68
className={`w-48 h-48 transition-all border-4 border-black rounded-none ${isRecording || isStreaming
? 'bg-red-500 hover:bg-red-600 text-white shadow-none translate-x-[4px] translate-y-[4px]'
: 'bg-white hover:bg-gray-100 text-black shadow-[8px_8px_0px_0px_rgba(0,0,0,1)] hover:shadow-[6px_6px_0px_0px_rgba(0,0,0,1)] hover:translate-x-[2px] hover:translate-y-[2px]'
}`}
}`}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

aria-pressed doesn't reflect the combined active state.

The visual styling and label now treat isRecording || isStreaming as the active state, but aria-pressed on line 70 still only reflects isRecording. Screen-reader users won't be informed that the button is visually "active" when streaming alone is true.

Proposed fix
-        aria-pressed={isRecording}
+        aria-pressed={isRecording || isStreaming}

Also applies to: 94-101

🤖 Prompt for AI Agents
In `@frontend/components/PushToTalk.tsx` around lines 65 - 68, The aria-pressed
attribute must reflect the same combined active state used for styling/labeling:
change any aria-pressed={isRecording} to aria-pressed={isRecording ||
isStreaming} in the PushToTalk component (update the primary button element and
the secondary occurrence around the code block that handles the alternate
button/label), ensuring both the visual state and accessibility state use the
same boolean expression (isRecording || isStreaming).

@akshatvasisht akshatvasisht merged commit 8171c8d into main Feb 14, 2026
1 check passed
@akshatvasisht akshatvasisht deleted the unified-test-suite branch February 14, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant