feat: add voice input feature with transcription support#2344
feat: add voice input feature with transcription support#2344princeaden1 wants to merge 4 commits intodyad-sh:mainfrom
Conversation
|
@BugBot run |
|
All contributors have signed the CLA ✍️ ✅ |
Summary of ChangesHello @princeaden1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a significant new voice input capability, enabling users to interact with the application using spoken commands and queries. It encompasses a full-stack implementation, from front-end UI components for recording and real-time audio visualization to a robust backend integration for speech-to-text transcription. The feature is carefully designed with user experience in mind, including clear visual cues for recording and transcription states, and is gated as a Pro-only offering, complete with dedicated end-to-end tests to ensure its reliability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Multi-Agent Code Review SummaryReview Method: 3 independent Claude agents reviewed this PR with files in different randomized orders. Issues below were identified by 2+ agents with at least one rating MEDIUM or higher severity. Issues Found
See inline comments for detailed explanations and suggested fixes. Generated by multi-agent PR review system |
Greptile OverviewGreptile SummaryThis PR adds a voice input feature to chat inputs by introducing Key integration points:
Issues to fix before merge are mainly around correctness/robustness of transcription request construction and validation (format handling, base64 size limit, and baseURL optionality in the engine helper). Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Renderer as Renderer (ChatInput/HomeChatInput)
participant Hook as useVoiceInput/useAudioRecorder
participant IPC as ipc.misc.transcribeAudio
participant Main as Main Process (transcription_handlers)
participant Engine as Dyad Engine (/audio/transcriptions)
User->>Renderer: Click mic button
Renderer->>Hook: handleMicClick()
Hook->>Hook: navigator.mediaDevices.getUserMedia(audio)
Hook->>Hook: new MediaRecorder(stream).start()
Hook-->>Renderer: isRecording=true, analyser set
Renderer->>Renderer: Show VoiceWaveform(analyser)
User->>Renderer: Click stop
Renderer->>Hook: handleMicClick()
Hook->>Hook: mediaRecorder.stop()
Hook->>Hook: onstop => Blob(chunks)
Hook->>IPC: transcribeAudio({audioData: base64, format})
IPC->>Main: IPC chat:transcribe
Main->>Main: validate payload + size/format
alt E2E_TEST_BUILD=true
Main-->>IPC: return "E2E transcribed text"
else
Main->>Engine: POST multipart (file, model)
Engine-->>Main: {text}
Main-->>IPC: return text
end
IPC-->>Hook: transcribed text
Hook-->>Renderer: appendText(text)
Renderer->>Renderer: Update Lexical input value
|
There was a problem hiding this comment.
Code Review
This pull request introduces a voice input and transcription feature, a significant enhancement to the user interface, covering frontend components, hooks, backend IPC handlers, and end-to-end tests. However, a critical security weakness was identified in the IPC handler where untrusted input from the renderer is not properly validated, potentially leading to a Denial of Service (DoS) via memory exhaustion. It is recommended to use existing Zod contracts for runtime validation to mitigate this risk. Additionally, the review suggests improvements for UI consistency, performance, and code maintainability, including using the styled tooltip, optimizing waveform drawing logic, and refactoring duplicated code in chat input components.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abf03bdd80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
7 issues found across 10 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/components/chat/VoiceWaveform.tsx">
<violation number="1" location="src/components/chat/VoiceWaveform.tsx:38">
P2: Avoid creating the canvas gradient inside the per-bar loop; it should be created once per frame and reused to prevent unnecessary allocations during animation.</violation>
</file>
<file name="src/ipc/handlers/transcription_handlers.ts">
<violation number="1" location="src/ipc/handlers/transcription_handlers.ts:16">
P2: The IPC handler accepts `audioData` without runtime validation or size limits. Use the Zod schema defined in `src/ipc/types/misc.ts` (`miscContracts.transcribeAudio`) to validate and parse the input. Consider adding a maximum length constraint on `audioData` to prevent memory exhaustion from excessively large payloads.</violation>
<violation number="2" location="src/ipc/handlers/transcription_handlers.ts:30">
P2: Validate the renderer-provided `format` before using it in the filename; otherwise a crafted format can inject path separators or unsupported extensions into the multipart upload filename.</violation>
</file>
<file name="src/components/chat/ChatInput.tsx">
<violation number="1" location="src/components/chat/ChatInput.tsx:489">
P2: Disable sending while voice recording/transcription is active; otherwise users can submit while recording and the mic button disappears once streaming starts, leaving the recording running with no way to stop it.</violation>
</file>
<file name="src/hooks/useAudioRecorder.ts">
<violation number="1" location="src/hooks/useAudioRecorder.ts:46">
P2: Missing `onerror` handler on MediaRecorder. If a recording error occurs mid-session (e.g., microphone disconnected), the MediaRecorder becomes inactive but `isRecording` state remains true and the timer continues. Subsequent calls to `stopRecording()` will throw `InvalidStateError` on the inactive recorder. Add an `onerror` handler to clean up state and notify the user.</violation>
<violation number="2" location="src/hooks/useAudioRecorder.ts:93">
P1: Add cleanup effect to stop active recording resources on component unmount. The hook creates a setInterval timer, MediaRecorder, and AudioContext but has no useEffect cleanup. If a user navigates away while recording, these resources continue running, causing memory leaks and potential React state-update warnings on unmounted components.</violation>
<violation number="3" location="src/hooks/useAudioRecorder.ts:169">
P2: Race condition when double-clicking the mic button. The `handleMicClick` checks `isRecording` state, but `startRecording` contains an async `getUserMedia()` call before `setIsRecording(true)`. Rapid clicks during the permission dialog can start multiple recordings, orphaning the first MediaRecorder/AudioContext and causing resource leaks. Add a guard (e.g., `isStartingRef`) set synchronously before the async call.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
@BugBot run |
Multi-Agent Code Review SummaryThis PR was reviewed by 3 independent Claude sub-agents. Issues are reported below where 2+ agents agreed and at least one rated the issue MEDIUM severity or higher. Issues Found: 5🔴 HIGH: Missing IPC client method for transcribeAudioConsensus: 3/3 agents identified this issue Description: The code calls Suggestion: Verify that a 🔴 HIGH: Missing preload allowlist entry for transcription IPC channelConsensus: 3/3 agents identified this issue Description: According to CLAUDE.md: "New IPC APIs must be added to src/preload.ts". The new Suggestion: Add the 🟡 MEDIUM: Request ID tracking never cleans up completed requestsConsensus: 3/3 agents identified this issue Description: The Suggestion: Consider removing entries after successful transcription, or implement time-based expiration. 🟡 MEDIUM: MediaRecorder error handler doesn't propagate errors to usersConsensus: 2/3 agents identified this issue Description: When Suggestion: Accept an optional 🟡 MEDIUM: FileReader error handling has race conditionsConsensus: 2/3 agents identified this issue Description: In Suggestion: Wrap the FileReader logic in a Promise and properly reject on error, or add a flag to prevent 🤖 Generated by Multi-Agent PR Review using 3 independent Claude sub-agents with consensus voting |
|
@BugBot run |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| } catch (err) { | ||
| logger.error("Error starting recording:", err); | ||
| throw err; | ||
| } finally { |
There was a problem hiding this comment.
Stream leaked when recording setup fails after getUserMedia
Medium Severity
The catch block in startRecording re-throws without cleaning up resources obtained earlier in the try block. If getUserMedia succeeds (line 64) and streamRef.current is set (line 66), but then new AudioContext() or new MediaRecorder() throws, the stream is never stopped. On retry, a new stream overwrites streamRef.current, orphaning the original. The microphone remains active for the orphaned stream with no way to release it until component unmount or page close.
src/ipc/utils/llm_engine_provider.ts
Outdated
| } | ||
|
|
||
| return currentAttempt; | ||
| } |
There was a problem hiding this comment.
Request ID attempt tracking serves no purpose
Low Severity
The transcriptionRequestIdAttempts map and trackRequestAttempt function track retry attempts for request IDs, but each transcription call generates a fresh UUID via uuidv4() in transcription_handlers.ts. Since request IDs are never reused, the attempt count will always be 1, making this tracking mechanism dead code that adds complexity without benefit.
Additional Locations (1)
🤖 Multi-Agent Code Review SummaryPR: feat: add voice input feature with transcription support Three independent Claude review agents analyzed this PR. Issues are reported only when 2+ agents identified them and at least one rated it MEDIUM or higher severity. Issues Found: 3
See inline comments for details and suggested fixes. Review performed by 3 Claude agents with randomized file ordering to reduce bias. |
src/hooks/useAudioRecorder.ts
Outdated
| audioBlob, | ||
| startRecording, | ||
| stopRecording, | ||
| analyser: analyserRef.current, |
There was a problem hiding this comment.
🤖 Multi-Agent Review Finding (3/3 agents, MEDIUM severity)
Issue: The analyser property returns analyserRef.current directly, which is captured at render time. When recording starts, analyserRef.current is updated inside startRecording, but the returned analyser in the component will still be null until a re-render occurs. This causes the VoiceWaveform component to receive null initially when recording begins.
Suggestion: Consider storing the analyser in React state instead of only in a ref, or return the ref itself and access .current in the consuming component to ensure the value is always current when isRecording is true.
src/hooks/useAudioRecorder.ts
Outdated
| logger.warn("Failed to stop media recorder", error); | ||
| } | ||
| } | ||
| setIsRecording(false); |
There was a problem hiding this comment.
🤖 Multi-Agent Review Finding (2/3 agents, MEDIUM severity)
Issue: Race condition - setIsRecording(false) is called here immediately, but the mediaRecorder.onstop callback (line 112) also calls setIsRecording(false). This creates a duplicate state update where isRecording may be set to false before the recording data is actually finalized and the onRecordingComplete callback is called, potentially causing UI inconsistencies.
Suggestion: Remove setIsRecording(false) from stopRecording and let the onstop handler manage the state transition after the blob is created and onRecordingComplete is called.
src/ipc/utils/llm_engine_provider.ts
Outdated
| } | ||
|
|
||
| const formData = new FormData(); | ||
| const blob = new Blob([audioBuffer as any]); |
There was a problem hiding this comment.
🤖 Multi-Agent Review Finding (3/3 agents, MEDIUM severity)
Issue: The Blob is created with audioBuffer as any without specifying a MIME type. This may cause issues with the transcription API as the server expects a properly typed audio file, but the Blob is created without a type property.
Suggestion: Add the MIME type to the Blob constructor based on the filename extension:
| const blob = new Blob([audioBuffer as any]); | |
| const mimeType = filename.endsWith('.webm') ? 'audio/webm' : | |
| filename.endsWith('.mp3') ? 'audio/mpeg' : | |
| filename.endsWith('.wav') ? 'audio/wav' : | |
| filename.endsWith('.m4a') ? 'audio/mp4' : 'audio/webm'; | |
| const blob = new Blob([audioBuffer], { type: mimeType }); |
|
@BugBot run |
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/hooks/useAudioRecorder.ts">
<violation number="1" location="src/hooks/useAudioRecorder.ts:75">
P1: This state update may run after the component has unmounted (e.g. if the user navigates away while the permission prompt is visible). If unmounted, the media resources (stream and audio context) must be cleaned up to prevent the microphone from staying active (resource leak).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
🤖 Multi-Agent PR Review SummaryThree independent AI agents reviewed this PR, each analyzing the code from different perspectives to reduce ordering bias. Consensus Issues FoundIssues reported below were identified by 2+ agents with at least one rating the issue as MEDIUM or higher severity.
Issue Details1. Response JSON validation missing (3/3 agents, all MEDIUM) In 2. State update on unmounted component (2/3 agents, one MEDIUM) In Review performed by 3 independent Claude agents with consensus voting (2+ agreement required) |
| `Dyad Engine transcription failed: ${response.status} ${response.statusText} - ${errorText}`, | ||
| ); | ||
| } | ||
| const data = (await response.json()) as { text: string }; |
There was a problem hiding this comment.
🤖 Multi-Agent Review (3/3 agents, all MEDIUM severity)
The response JSON is cast to { text: string } without validation. If the API returns an unexpected format (e.g., error object, empty object, or different structure), data.text could be undefined, leading to silent failures.
| const data = (await response.json()) as { text: string }; | |
| const data = (await response.json()) as { text?: string }; | |
| if (!data || typeof data.text !== 'string') { | |
| throw new Error('Invalid transcription response: missing text field'); | |
| } | |
| return data.text; |
| err instanceof Error ? err.message : "Failed to transcribe audio", | ||
| ); | ||
| } | ||
| setIsTranscribing(false); |
There was a problem hiding this comment.
🤖 Multi-Agent Review (2/3 agents, one MEDIUM severity)
setIsTranscribing(false) is called unconditionally after the async transcription completes. If the component unmounts during transcription, this will attempt to update state on an unmounted component.
Consider:
- Adding an
isMountedref touseVoiceInput(similar to howuseAudioRecorderusesisMountedRef) - Or moving this into a finally block with a mount check
| setIsTranscribing(false); | |
| setIsTranscribing(false); | |
| }, | |
| [appendText, onError], | |
| ); |
Note: The suggestion above keeps the current structure. To fix properly, add an isMountedRef to useVoiceInput and check it before calling setIsTranscribing(false).
- Fix stream leak when recording setup fails after getUserMedia - Fix analyser ref returning stale null by using React state - Fix race condition with duplicate setIsRecording(false) calls - Remove dead request ID attempt tracking code - Add MIME type to Blob in transcription for proper audio typing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
18c3d9e to
fa71433
Compare
|
@BugBot run |
| const baseURL = withoutTrailingSlash(options.baseURL); | ||
| const apiKey = loadApiKey({ | ||
| apiKey: options.apiKey, | ||
| environmentVariableName: "DYAD_PRO_API_KEY", | ||
| description: "Dyad Pro API key", | ||
| }); | ||
| logger.info("transcribing with dyad engine with baseURL", baseURL); | ||
|
|
||
| const formData = new FormData(); |
There was a problem hiding this comment.
Missing baseURL crashes
transcribeWithDyadEngine calls withoutTrailingSlash(options.baseURL) but baseURL is optional in ExampleProviderSettings (baseURL?: string). If options.baseURL is undefined, this will throw before any network request. This can happen if another caller reuses transcribeWithDyadEngine without explicitly passing baseURL.
Also appears in createDyadEngine at src/ipc/utils/llm_engine_provider.ts:68.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ipc/utils/llm_engine_provider.ts
Line: 236:244
Comment:
**Missing baseURL crashes**
`transcribeWithDyadEngine` calls `withoutTrailingSlash(options.baseURL)` but `baseURL` is optional in `ExampleProviderSettings` (`baseURL?: string`). If `options.baseURL` is `undefined`, this will throw before any network request. This can happen if another caller reuses `transcribeWithDyadEngine` without explicitly passing `baseURL`.
Also appears in `createDyadEngine` at `src/ipc/utils/llm_engine_provider.ts:68`.
How can I resolve this? If you propose a fix, please make it concise.| const text = await ipc.misc.transcribeAudio({ | ||
| audioData: base64Content, | ||
| format: "webm", | ||
| }); |
There was a problem hiding this comment.
Wrong audio format sent
useVoiceInput always passes format: "webm" to ipc.misc.transcribeAudio (src/hooks/useAudioRecorder.ts:202-205), even though the recorded Blob’s actual MIME/type can vary by browser/platform. This can lead to the main-process handler rejecting the request as an unsupported format (or producing a mismatched filename/MIME downstream).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/hooks/useAudioRecorder.ts
Line: 202:205
Comment:
**Wrong audio format sent**
`useVoiceInput` always passes `format: "webm"` to `ipc.misc.transcribeAudio` (`src/hooks/useAudioRecorder.ts:202-205`), even though the recorded Blob’s actual MIME/type can vary by browser/platform. This can lead to the main-process handler rejecting the request as an unsupported format (or producing a mismatched filename/MIME downstream).
How can I resolve this? If you propose a fix, please make it concise.| if (audioData.length > MAX_AUDIO_BASE64_LENGTH) { | ||
| throw new Error("Audio payload exceeds maximum allowed size."); | ||
| } | ||
| const normalizedFormat = format.trim().toLowerCase(); |
There was a problem hiding this comment.
Base64 size check wrong
The handler checks audioData.length against MAX_AUDIO_BASE64_LENGTH (src/ipc/handlers/transcription_handlers.ts:21-22), but base64 expands binary by ~33%. As written, the decoded Buffer.from(audioData, "base64") can be significantly larger than the intended cap, impacting memory usage.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ipc/handlers/transcription_handlers.ts
Line: 21:24
Comment:
**Base64 size check wrong**
The handler checks `audioData.length` against `MAX_AUDIO_BASE64_LENGTH` (`src/ipc/handlers/transcription_handlers.ts:21-22`), but base64 expands binary by ~33%. As written, the decoded `Buffer.from(audioData, "base64")` can be significantly larger than the intended cap, impacting memory usage.
How can I resolve this? If you propose a fix, please make it concise.| <button | ||
| data-testid="voice-input-button" | ||
| onClick={proModeTogglable || isRecording ? onClick : undefined} | ||
| className={className} | ||
| aria-label={title} | ||
| disabled={isDisabled} | ||
| /> |
There was a problem hiding this comment.
🔴 VoiceInputButton missing title attribute causes E2E test failures
The E2E tests expect the voice input button to have a title attribute, but the component only sets aria-label.
Click to expand
E2E Test Expectations
The E2E tests in e2e-tests/voice_transcription.spec.ts assert on the title attribute:
await expect(voiceBtn).toHaveAttribute("title", "Start voice input");
await expect(voiceBtn).toHaveAttribute("title", "Stop recording");
await expect(voiceBtn).toHaveAttribute("title", "Pro feature only");Actual Implementation
In src/components/chat/VoiceInputButton.tsx:54-60, the button only sets aria-label:
<button
data-testid="voice-input-button"
onClick={proModeTogglable || isRecording ? onClick : undefined}
className={className}
aria-label={title} // <-- Only aria-label is set
disabled={isDisabled}
/>Impact
All voice transcription E2E tests will fail because the title attribute is never set on the button element. The tests at lines 49, 57, 60, and 81 of voice_transcription.spec.ts will not find the expected title attribute.
Recommendation: Add title={title} to the button element alongside aria-label={title}:
<button
data-testid="voice-input-button"
onClick={proModeTogglable || isRecording ? onClick : undefined}
className={className}
aria-label={title}
title={title}
disabled={isDisabled}
/>Was this helpful? React with 👍 or 👎 to provide feedback.
🔍 Multi-Agent Code ReviewFound 2 new issue(s) flagged by 3 independent reviewers using consensus voting. Summary
New Issues to Address
See inline comments for details. ℹ️ Issues already addressed by existing review commentsThe following issues were identified by our multi-agent review but have already been flagged by other reviewers:
Generated by multi-agent consensus review (3 agents) |
| const MAX_AUDIO_BASE64_LENGTH = 10 * 1024 * 1024; | ||
| const ALLOWED_AUDIO_FORMATS = new Set(["webm", "mp3", "wav", "m4a"]); | ||
|
|
||
| export function registerTranscriptionHandlers() { |
There was a problem hiding this comment.
🔴 HIGH: Handler bypasses contract-driven IPC architecture (3/3 agents)
This handler uses raw ipcMain.handle directly instead of createTypedHandler from the project's contract-driven IPC architecture. According to the project's CLAUDE.md:
Handlers should use
createTypedHandler(contract, handler)which validates inputs at runtime via Zod.
The current implementation manually parses the input with miscContracts.transcribeAudio.input.parse(input) inside a try/catch, but this is inconsistent with all other handlers in src/ipc/handlers/*.ts which use the standardized createTypedHandler pattern.
Suggestion: Refactor to use the established pattern:
| export function registerTranscriptionHandlers() { | |
| export function registerTranscriptionHandlers() { | |
| createTypedHandler( | |
| miscContracts.transcribeAudio, |
Then update the handler signature and move the logic into the typed handler body. This provides:
- Consistent runtime Zod validation
- Standardized error handling and logging
- Type-safe handler implementation
| const content = base64data.split(",")[1]; | ||
| if (!content) { | ||
| reject(new Error("Invalid audio data format")); | ||
| return; |
There was a problem hiding this comment.
🟡 MEDIUM: setIsTranscribing(false) should be in finally block (2/3 agents)
The setIsTranscribing(false) call is placed after the try/catch block, which works correctly now. However, this is fragile - if any future modification adds a re-throw in the catch block or if an error occurs in the catch block itself, isTranscribing would remain true indefinitely, leaving the UI in a stuck state.
Suggestion: Move setIsTranscribing(false) to a finally block for robustness:
const handleRecordingComplete = useCallback(
async (blob: Blob) => {
setIsTranscribing(true);
try {
// ... existing logic ...
} catch (err) {
logger.error("Transcription failed", err);
onError?.(
err instanceof Error ? err.message : "Failed to transcribe audio",
);
} finally {
setIsTranscribing(false);
}
},
[appendText, onError],
);This ensures isTranscribing is always reset regardless of success or failure paths.
🎭 Playwright Test Results❌ Some tests failed
Summary: 536 passed, 69 failed, 2 flaky, 216 skipped Failed Tests🍎 macOSShow all 52 failures
🪟 WindowsShow all 17 failures
📋 Test Commands (macOS)Copy and paste to re-run failing tests locally: Show all 52 test commandsexport PLAYWRIGHT_HTML_OPEN=never
# annotator.spec.ts > annotator - capture and submit screenshot
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/annotator.spec.ts -- -g "annotator - capture and submit screenshot"
# astro.spec.ts > astro
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/astro.spec.ts -- -g "astro"
# attach_image.spec.ts > attach image - chat
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/attach_image.spec.ts -- -g "attach image - chat"
# attach_image.spec.ts > attach image - chat - upload to codebase
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/attach_image.spec.ts -- -g "attach image - chat - upload to codebase"
# attach_image.spec.ts > attach image via drag - chat
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/attach_image.spec.ts -- -g "attach image via drag - chat"
# chat_history.spec.ts > should open, navigate, and select from history menu
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/chat_history.spec.ts -- -g "should open, navigate, and select from history menu"
# chat_history.spec.ts > should handle edge cases: guards, escape, and sending after cancel
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/chat_history.spec.ts -- -g "should handle edge cases: guards, escape, and sending after cancel"
# chat_input.spec.ts > send button disabled during pending proposal
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/chat_input.spec.ts -- -g "send button disabled during pending proposal"
# chat_input.spec.ts > send button disabled during pending proposal - reject
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/chat_input.spec.ts -- -g "send button disabled during pending proposal - reject"
# chat_mode.spec.ts > chat mode selector - default build mode
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/chat_mode.spec.ts -- -g "chat mode selector - default build mode"
# chat_mode.spec.ts > chat mode selector - ask mode
# Expected: TimeoutError: locator.click: Timeout 30000ms exceeded.
npm run e2e e2e-tests/chat_mode.spec.ts -- -g "chat mode selector - ask mode"
# context_window.spec.ts > context window
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/context_window.spec.ts -- -g "context window"
# copy_chat.spec.ts > copy message content - basic functionality
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/copy_chat.spec.ts -- -g "copy message content - basic functionality"
# copy_chat.spec.ts > copy message content - dyad-write conversion
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/copy_chat.spec.ts -- -g "copy message content - dyad-write conversion"
# dyad_tags_parsing.spec.ts > dyad tags handles nested < tags
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/dyad_tags_parsing.spec.ts -- -g "dyad tags handles nested < tags"
# free_agent_quota.spec.ts > free agent quota - full flow: mode availability, quota tracking, exceeded banner, switch to build
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/free_agent_quota.spec.ts -- -g "free agent quota - full flow: mode availability, quota tracking, exceeded banner, switch to build"
# free_agent_quota.spec.ts > free agent quota - quota resets after 24 hours
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/free_agent_quota.spec.ts -- -g "free agent quota - quota resets after 24 hours"
# local_agent_advanced.spec.ts > local-agent - mcp tool call
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_advanced.spec.ts -- -g "local-agent - mcp tool call"
# local_agent_ask.spec.ts > local-agent ask mode
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_ask.spec.ts -- -g "local-agent ask mode"
# local_agent_auto.spec.ts > local-agent - auto model
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_auto.spec.ts -- -g "local-agent - auto model"
# local_agent_basic.spec.ts > local-agent - dump request
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_basic.spec.ts -- -g "local-agent - dump request"
# local_agent_basic.spec.ts > local-agent - read then edit
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_basic.spec.ts -- -g "local-agent - read then edit"
# local_agent_basic.spec.ts > local-agent - parallel tool calls
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_basic.spec.ts -- -g "local-agent - parallel tool calls"
# local_agent_code_search.spec.ts > local-agent - code search
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_code_search.spec.ts -- -g "local-agent - code search"
# local_agent_consent.spec.ts > local-agent - add_dependency consent: always allow
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_consent.spec.ts -- -g "local-agent - add_dependency consent: always allow"
# local_agent_consent.spec.ts > local-agent - add_dependency consent: allow once
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_consent.spec.ts -- -g "local-agent - add_dependency consent: allow once"
# local_agent_consent.spec.ts > local-agent - add_dependency consent: decline
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_consent.spec.ts -- -g "local-agent - add_dependency consent: decline"
# local_agent_file_upload.spec.ts > local-agent - upload file to codebase
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_file_upload.spec.ts -- -g "local-agent - upload file to codebase"
# local_agent_grep.spec.ts > local-agent - grep search
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_grep.spec.ts -- -g "local-agent - grep search"
# local_agent_list_files.spec.ts > local-agent - list_files
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_list_files.spec.ts -- -g "local-agent - list_files"
# local_agent_read_logs.spec.ts > local-agent - read logs with filters
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_read_logs.spec.ts -- -g "local-agent - read logs with filters"
# local_agent_run_type_checks.spec.ts > local-agent - run_type_checks updates problems panel
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_run_type_checks.spec.ts -- -g "local-agent - run_type_checks updates problems panel"
# local_agent_search_replace.spec.ts > local-agent - search_replace edit
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_search_replace.spec.ts -- -g "local-agent - search_replace edit"
# local_agent_summarize.spec.ts > local-agent - summarize to new chat works
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/local_agent_summarize.spec.ts -- -g "local-agent - summarize to new chat works"
# mcp.spec.ts > mcp - call calculator
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/mcp.spec.ts -- -g "mcp - call calculator"
# mcp.spec.ts > mcp - call calculator via http
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/mcp.spec.ts -- -g "mcp - call calculator via http"
# new_chat.spec.ts > new chat (first button)
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/new_chat.spec.ts -- -g "new chat \(first button\)"
# new_chat.spec.ts > new chat (second button)
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/new_chat.spec.ts -- -g "new chat \(second button\)"
# partial_response.spec.ts > partial message is resumed
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/partial_response.spec.ts -- -g "partial message is resumed"
# rename_edit.spec.ts > rename then edit works
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/rename_edit.spec.ts -- -g "rename then edit works"
# select_component.spec.ts > select component
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/select_component.spec.ts -- -g "select component"
# select_component.spec.ts > select multiple components
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/select_component.spec.ts -- -g "select multiple components"
# select_component.spec.ts > deselect component
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/select_component.spec.ts -- -g "deselect component"
# select_component.spec.ts > select component next.js
# Expected: Error: expect(locator).toMatchAriaSnapshot(expected) failed
npm run e2e e2e-tests/select_component.spec.ts -- -g "select component next\.js" --update-snapshots
# smart_context_deep.spec.ts > smart context deep - read write read
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/smart_context_deep.spec.ts -- -g "smart context deep - read write read"
# supabase_branch.spec.ts > supabase branch selection works
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/supabase_branch.spec.ts -- -g "supabase branch selection works"
# supabase_client.spec.ts > supabase client is generated
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/supabase_client.spec.ts -- -g "supabase client is generated"
# undo.spec.ts > undo
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/undo.spec.ts -- -g "undo"
# undo.spec.ts > undo with native git
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/undo.spec.ts -- -g "undo with native git"
# undo.spec.ts > undo after assistant with no code
# Expected: TimeoutError: locator.fill: Timeout 30000ms exceeded.
npm run e2e e2e-tests/undo.spec.ts -- -g "undo after assistant with no code"
# voice_transcription.spec.ts > voice transcription - home chat: record, transcribe, append to input
# Expected: Error: expect(locator).toHaveAttribute(expected) failed
npm run e2e e2e-tests/voice_transcription.spec.ts -- -g "voice transcription - home chat: record, transcribe, append to input"
# voice_transcription.spec.ts > voice input - non-Pro user sees Pro-only disabled state
# Expected: Error: expect(locator).toHaveAttribute(expected) failed
npm run e2e e2e-tests/voice_transcription.spec.ts -- -g "voice input - non-Pro user sees Pro-only disabled state"
|
closes #1804
Note
Introduces voice input across chat inputs with transcription via Dyad Engine and Pro gating.
LexicalVoiceInputRowinChatInputandHomeChatInput, adding mic control, waveform (VoiceWaveform), and send/cancel integrationVoiceInputButtonhandles Pro-only disabled state, recording/transcribing states, and tooltipsuseAudioRecorderanduseVoiceInputto record viaMediaRecorder, visualize withAnalyserNode, and callipc.misc.transcribeAudiomisc.transcribeAudiocontract, registerstranscription_handlersthat validate input, support E2E mock, and calltranscribeWithDyadEnginetranscribeWithDyadEnginewith request-id attempt tracking and multipart upload to/audio/transcriptionsgetUserMediaand asserts transcription appendWritten by Cursor Bugbot for commit 7dc1944. This will update automatically on new commits. Configure here.
Summary by cubic
Adds voice input with waveform visualization and transcription for chat, gated to Dyad Pro users. Improves reliability with fixes for recording setup leaks, analyser state, and proper audio MIME typing so IPC-backed Dyad Engine transcription consistently appends text; addresses #1804.
New Features
Migration
Written for commit fa71433. Summary will update on new commits.