Trigger Word for End-of-Speech Detection ("Walkie-Talkie Mode")

# Feature Request: Trigger Word for End-of-Speech Detection ("Walkie-Talkie Mode")

## Problem

When using VoiceMode in noisy environments (walking on streets, cafes, outdoors), the silence-based end-of-speech detection (WebRTC VAD) struggles to determine when the user has finished speaking. Background noise prevents reliable silence detection, leading to:

- Premature cutoffs (VAD thinks noise is the end of speech)
- Long waits (VAD never detects silence, waits for `listen_duration_max`)

## Proposed Solution

Add support for an optional **trigger word** (like "over" in walkie-talkie/radio communication) that explicitly signals the end of speech. When the user says "over" at the end of their message, VoiceMode stops listening immediately.

### Configuration

```bash
# In voicemode.env or environment variable
VOICEMODE_TRIGGER_WORD=over
# Or for non-English: VOICEMODE_TRIGGER_WORD=fertig
```

### Behavior

1. User speaks: "Please help me refactor the authentication module, over"
2. VoiceMode detects trigger word → stops recording immediately
3. Trigger word is stripped from transcription → Claude receives: "Please help me refactor the authentication module"

## Implementation Options

### Option A: Post-transcription Detection (Simple)

After STT completes, check if transcription ends with trigger word and strip it.

**Location:** `tools/converse.py` around line 1655

```python
# After: response_text = stt_result.get("text")
if TRIGGER_WORD and response_text:
    # Check for trigger word at end (case-insensitive)
    text_lower = response_text.lower().rstrip()
    trigger_lower = TRIGGER_WORD.lower()
    if text_lower.endswith(trigger_lower):
        # Strip trigger word
        response_text = response_text[:-(len(TRIGGER_WORD))].rstrip()
        # Also strip common trailing punctuation/comma before trigger
        response_text = response_text.rstrip('.,;:')
```

**Pros:** Simple, minimal code changes
**Cons:** Doesn't solve the waiting problem - still waits for silence/timeout before transcribing

### Option B: Real-time Keyword Detection (Better UX)

Detect trigger word during recording to stop immediately.

**Approaches:**
1. **Vosk** - Lightweight offline speech recognition, can do streaming
2. **Periodic STT** - Send audio chunks every few seconds to Whisper, check for keyword
3. **Porcupine/similar** - Wake word detection (would need custom model training)

**Location:** `tools/converse.py` in `record_audio_with_silence_detection()`

**Pros:** Immediate response when user says trigger word
**Cons:** More complex, additional dependencies, potential latency

### Recommended Approach

Start with **Option A** for quick implementation, then consider **Option B** as an enhancement if users need faster response in noisy environments.

## Use Cases

1. **Outdoor/mobile use** - Walking, public transport, street noise
2. **Open office** - Background chatter interferes with silence detection
3. **Hands-free operation** - Clear signal without needing to press a button
4. **International users** - Can configure trigger word in their language

## Workaround (Current)

Users can currently work around this by:
```python
disable_silence_detection=True, listen_duration_max=30
```

But this requires waiting the full duration or awkward silence.

## Additional Considerations

- Trigger word should be configurable (different languages, preferences)
- Should be optional (disabled by default for backward compatibility)
- Could support multiple trigger words: `VOICEMODE_TRIGGER_WORDS=over,done,finished`
- Consider adding a parameter to converse(): `trigger_word: Optional[str] = None`

---

**Environment:**
- VoiceMode version: 7.4.2
- Use case: concept elaboration via voice while mobile


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trigger Word for End-of-Speech Detection ("Walkie-Talkie Mode") #210

Feature Request: Trigger Word for End-of-Speech Detection ("Walkie-Talkie Mode")

Problem

Proposed Solution

Configuration

Behavior

Implementation Options

Option A: Post-transcription Detection (Simple)

Option B: Real-time Keyword Detection (Better UX)

Recommended Approach

Use Cases

Workaround (Current)

Additional Considerations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trigger Word for End-of-Speech Detection ("Walkie-Talkie Mode") #210

Description

Feature Request: Trigger Word for End-of-Speech Detection ("Walkie-Talkie Mode")

Problem

Proposed Solution

Configuration

Behavior

Implementation Options

Option A: Post-transcription Detection (Simple)

Option B: Real-time Keyword Detection (Better UX)

Recommended Approach

Use Cases

Workaround (Current)

Additional Considerations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions