Add playwright plugin for Claude Code#6304
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6304 +/- ##
=========================================
Coverage 42.58% 42.58%
=========================================
Files 2497 2497
Lines 43372 43372
Branches 10231 9850 -381
=========================================
Hits 18470 18470
- Misses 23576 24865 +1289
+ Partials 1326 37 -1289 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds a comprehensive Claude Code plugin for debugging and fixing Playwright E2E test failures. The plugin provides automated analysis of test failures from CI, intelligent grouping of errors, and delegation to specialized subagents for investigation and fixes.
Changes:
- Added a new
/analyze-playwright-failurescommand that accepts PR links, GitHub Action workflow runs, or paths to Playwright results - Implemented automated download, unpacking, and parsing of Playwright test results with semantic error categorization
- Created subagent architecture with exploration agents for codebase analysis and specialized fixer agents for test repairs
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| .gitignore | Excludes plugin internal directories from version control |
| .claude/settings.json | Configures the custom plugin marketplace and enables the analyze-playwright-failures plugin |
| .claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/scripts/prepare-report.sh | Main script for downloading, extracting, merging, and parsing Playwright test reports |
| .claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/scripts/parse-failures.sh | Parses JSON reports and categorizes failures semantically |
| .claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/scripts/get-error-details.sh | Extracts detailed information for specific error categories |
| .claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/docs/trace-analysis.md | Documentation for advanced trace analysis techniques |
| .claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/docs/environment-restore.md | Documentation for restoring test environments and verifying data |
| .claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/SKILL.md | Main skill documentation defining the analysis workflow and agent delegation patterns |
| .claude/plugins/analyze-playwright-failures/scripts/block-trace-analysis.sh | Hook script to prevent direct trace analysis and enforce delegation |
| .claude/plugins/analyze-playwright-failures/scripts/block-direct-edit.sh | Hook script to prevent direct test file edits and enforce subagent usage |
| .claude/plugins/analyze-playwright-failures/hooks/hooks.json | Hook configuration for enforcing workflow patterns |
| .claude/plugins/analyze-playwright-failures/agents/e2e-test-fixer.md | Specialized subagent for fixing Playwright test failures |
| .claude/plugins/analyze-playwright-failures/.claude-plugin/plugin.json | Plugin metadata configuration |
| .claude/plugins/.claude-plugin/marketplace.json | Custom marketplace definition for local plugins |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "Running: npx playwright merge-reports --reporter=json $BLOB_DIR" | ||
|
|
||
| # Try to run from current directory first (should have playwright installed) | ||
| if npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON" 2>/dev/null; then |
There was a problem hiding this comment.
Error output is being suppressed with 2>/dev/null, which makes debugging difficult when the merge fails. Consider capturing stderr to a variable or log file so diagnostic information is available when troubleshooting merge failures.
| PROJECT_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || echo "") | ||
|
|
||
| if [ -n "$PROJECT_ROOT" ] && [ -f "$PROJECT_ROOT/package.json" ]; then | ||
| (cd "$PROJECT_ROOT" && npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON" 2>/dev/null) && { |
There was a problem hiding this comment.
Similar to line 43, stderr is being suppressed. This makes it harder to diagnose why the merge failed when the fallback to project root is attempted. Consider preserving error messages for troubleshooting.
| (cd "$PROJECT_ROOT" && npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON" 2>/dev/null) && { | |
| (cd "$PROJECT_ROOT" && npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON") && { |
| SUBDIR_COUNT=$(find "$EXTRACT_DIR" -mindepth 1 -maxdepth 1 -type d | wc -l | tr -d ' ') | ||
| if [ "$SUBDIR_COUNT" = "1" ]; then |
There was a problem hiding this comment.
The use of wc -l | tr -d ' ' to get a clean count is fragile and platform-dependent. Consider using $(... | wc -l) without tr and comparing numerically, or use -print -quit with test for a more robust check.
| SUBDIR_COUNT=$(find "$EXTRACT_DIR" -mindepth 1 -maxdepth 1 -type d | wc -l | tr -d ' ') | |
| if [ "$SUBDIR_COUNT" = "1" ]; then | |
| SUBDIR_COUNT=$(find "$EXTRACT_DIR" -mindepth 1 -maxdepth 1 -type d | wc -l) | |
| if [ "$SUBDIR_COUNT" -eq 1 ]; then |
| echo "Merge failed. Looking for alternative..." | ||
|
|
||
| # Maybe there's already a usable JSON | ||
| REPORT_FILE=$(find "$BLOB_DIR" -name "*.json" -not -name "*.tmp" 2>/dev/null | head -1) |
There was a problem hiding this comment.
Using find ... | head -1 is non-deterministic when multiple JSON files exist. The order depends on filesystem implementation. Consider adding -type f for clarity and sorting the results to ensure consistent behavior across different systems.
| REPORT_FILE=$(find "$BLOB_DIR" -name "*.json" -not -name "*.tmp" 2>/dev/null | head -1) | |
| REPORT_FILE=$(find "$BLOB_DIR" -type f -name "*.json" -not -name "*.tmp" 2>/dev/null | sort | head -n 1) |
| cat 0-trace.trace | jq -r 'select(.type == "before") | "\(.startTime) - \(.apiName) \(.params.selector // "")"' 2>/dev/null | ||
|
|
||
| # 2. Get screenshots taken 0-3 seconds AFTER a specific action | ||
| # Example: action was at timestamp 185821, get next few screenshots |
There was a problem hiding this comment.
The hardcoded timestamp value 185821 in the example should be clarified as a placeholder. Add a comment indicating this is an example value that should be replaced with actual timestamp from step 1, to avoid confusion.
| # Example: action was at timestamp 185821, get next few screenshots | |
| # Example: action was at timestamp 185821, get next few screenshots | |
| # NOTE: 185821 is an example value; replace with the actual action timestamp from step 1 |
| .failures | group_by(.tests[0].results[0].category) | | ||
| map({ | ||
| category: .[0].tests[0].results[0].category, | ||
| count: length, | ||
| domains: ([.[].domain] | unique), | ||
| rootCauses: ([.[].tests[0].results[0].rootCause | select(. != null)] | unique), | ||
| tests: [.[] | { | ||
| title: .title, | ||
| file: .file, | ||
| line: .line, | ||
| domain: .domain, | ||
| error: .tests[0].results[0].error.message, | ||
| errorFirstLine: (.tests[0].results[0].error.message | split("\n")[0] | .[0:150]), | ||
| rootCause: .tests[0].results[0].rootCause, | ||
| screenshot: ([.tests[0].results[0].attachments[] | select(.type == "screenshot") | .path] | first), | ||
| errorContext: ([.tests[0].results[0].attachments[] | select(.type == "error-context") | .path] | first) | ||
| }] | ||
| }) |
There was a problem hiding this comment.
Potential null pointer/array access issue. The jq expressions on lines 366-383 and 399-409 assume that failures have at least one test with at least one result. If a failure exists but has empty tests or results arrays, accessing .tests[0].results[0] will fail.
Consider adding null-safe access patterns or filtering out empty entries before grouping. For example:
.failures | map(select(.tests | length > 0) | select(.tests[0].results | length > 0)) | group_by(.tests[0].results[0].category)
| .failures | group_by(.tests[0].results[0].category) | | ||
| map({ | ||
| category: .[0].tests[0].results[0].category, | ||
| count: length, | ||
| domains: ([.[].domain] | unique), | ||
| rootCauses: ([.[].tests[0].results[0].rootCause | select(. != null)] | unique), | ||
| tests: [.[] | { | ||
| title: .title, | ||
| file: .file, | ||
| line: .line, | ||
| domain: .domain, | ||
| error: .tests[0].results[0].error.message, | ||
| errorFirstLine: (.tests[0].results[0].error.message | split("\n")[0] | .[0:150]), | ||
| rootCause: .tests[0].results[0].rootCause, | ||
| screenshot: ([.tests[0].results[0].attachments[] | select(.type == "screenshot") | .path] | first), | ||
| errorContext: ([.tests[0].results[0].attachments[] | select(.type == "error-context") | .path] | first) | ||
| }] | ||
| }) | ||
| ' "$OUTPUT_DIR/failures-full.json" > "$OUTPUT_DIR/failures-by-category.json" |
There was a problem hiding this comment.
Same potential null pointer/array access issue as in prepare-report.sh. The jq expressions on lines 146-164 and 178-184 assume that failures have at least one test with at least one result. If a failure exists but has empty tests or results arrays, accessing .tests[0].results[0] will fail.
Consider adding the same null-safe filtering as suggested for prepare-report.sh.
| gh pr edit $PR_NUM --add-label "run pw-e2e" | ||
| ``` | ||
|
|
||
| Then stop and ask user to come back after test ends run. |
There was a problem hiding this comment.
Grammatical error in the phrase "test ends run". This should be either "test run ends" or "tests have finished running".
| Then stop and ask user to come back after test ends run. | |
| Then stop and ask user to come back after the test run ends. |
| "hooks": [ | ||
| { | ||
| "type": "command", | ||
| "command": "bash -c 'if [[ \"$CLAUDE_FILE_PATH\" == *playwright/*.spec.ts* ]] || [[ \"$CLAUDE_FILE_PATH\" == *playwright/pages/*.ts* ]]; then echo \"🚫 BLOCKED: Use e2e-test-fixer subagent instead\" >&2; exit 2; fi'" |
There was a problem hiding this comment.
Potential command injection vulnerability in the inline bash command. The pattern matching uses wildcards within [[ ]] which is correct, but the entire command is embedded directly in JSON without proper escaping. If environment variables like CLAUDE_FILE_PATH contain special characters or malicious input, this could be exploited.
Consider moving this logic to a separate script file (like block-direct-edit.sh) for better security and maintainability, then reference that script in the hook configuration instead of using inline bash.
| @@ -0,0 +1,448 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
The shell scripts (prepare-report.sh, parse-failures.sh, get-error-details.sh, block-trace-analysis.sh, block-direct-edit.sh) are missing executable permissions. While the shebang line indicates they should be executable, the git repository should track these files with +x permissions to avoid confusion and ensure they work correctly when checked out.
Run: chmod +x .claude/plugins/dashboard-playwright/skills/analyze-failures/scripts/.sh .claude/plugins/dashboard-playwright/scripts/.sh
| local BLOB_DIR="$1" | ||
| local OUTPUT_JSON="$2" | ||
|
|
||
| echo "Running: npx playwright merge-reports --reporter=json $BLOB_DIR" |
There was a problem hiding this comment.
Potential command injection vulnerability. The BLOB_DIR variable is used directly in the command without proper quoting in the echo statement (line 40). While the variable is quoted in the actual command execution (line 43), the echo on line 40 should also quote the variable to prevent potential issues if BLOB_DIR contains spaces or special characters.
Change line 40 to: echo "Running: npx playwright merge-reports --reporter=json "$BLOB_DIR""
| echo "Running: npx playwright merge-reports --reporter=json $BLOB_DIR" | |
| echo "Running: npx playwright merge-reports --reporter=json \"$BLOB_DIR\"" |
| *.zip) | ||
| echo "Step 1: Extracting zip file..." | ||
| EXTRACT_DIR="$OUTPUT_DIR/extracted" | ||
| mkdir -p "$EXTRACT_DIR" |
There was a problem hiding this comment.
The unzip call extracts a user-supplied report ZIP ($INPUT_PATH) directly into $EXTRACT_DIR without sanitizing archive paths, which can allow a crafted ZIP with ../ segments in filenames to perform path traversal and overwrite arbitrary files relative to the current working directory (classic "Zip Slip" issue). An attacker who can supply the ZIP (e.g., a malicious artifact or shared report) could plant or overwrite files outside playwright-failures/extracted, potentially leading to code execution or data loss on the developer machine. To mitigate, ensure extraction strips or rejects .. and absolute paths (e.g., by pre-filtering with zipinfo/unzip -Z, using a safe extraction helper that normalizes and validates paths, or using tooling that enforces confinement to a specific directory).
| mkdir -p "$EXTRACT_DIR" | |
| mkdir -p "$EXTRACT_DIR" | |
| # Validate ZIP entries to prevent path traversal (Zip Slip) | |
| while IFS= read -r zip_entry; do | |
| # Disallow absolute paths and parent directory references | |
| case "$zip_entry" in | |
| /*|../*|*../*|*/..|..) | |
| echo "Error: Unsafe path in zip entry '$zip_entry' (possible path traversal)." >&2 | |
| exit 1 | |
| ;; | |
| esac | |
| done < <(unzip -Z1 "$INPUT_PATH") |
Added a dedicated plugin for debugging and fixing Playwright e2e tests as a Claude Code plugin
Note
We use plugin, because it can encapsulate all dependencies to make this work: skill itself is not enough because it cannot include hooks and subagents definitions
This plugin adds a new command:
/dashboard-playwright:analyze-failureswhich accepts a PR link, Github Action workflow run, or a path to a playwright resultsIt handles download and unpacking of playwright results so that it's readable by LLM.
Then it:
After that initial pass it calls separate subagent to fix actual playwright tests