Skip to content

Comments

Add playwright plugin for Claude Code#6304

Merged
witoszekdev merged 4 commits intomainfrom
add-playwright-claude-plugin
Feb 3, 2026
Merged

Add playwright plugin for Claude Code#6304
witoszekdev merged 4 commits intomainfrom
add-playwright-claude-plugin

Conversation

@witoszekdev
Copy link
Member

@witoszekdev witoszekdev commented Jan 30, 2026

Added a dedicated plugin for debugging and fixing Playwright e2e tests as a Claude Code plugin

Note

We use plugin, because it can encapsulate all dependencies to make this work: skill itself is not enough because it cannot include hooks and subagents definitions

This plugin adds a new command: /dashboard-playwright:analyze-failures which accepts a PR link, Github Action workflow run, or a path to a playwright results

It handles download and unpacking of playwright results so that it's readable by LLM.

Then it:

  1. Explores recent changes to find relevant diffs that might have introduced bugs / changed data-testid
  2. Explores codebase to understand it's structure

After that initial pass it calls separate subagent to fix actual playwright tests

Copilot AI review requested due to automatic review settings January 30, 2026 14:56
@witoszekdev witoszekdev requested a review from a team as a code owner January 30, 2026 14:56
@changeset-bot
Copy link

changeset-bot bot commented Jan 30, 2026

⚠️ No Changeset found

Latest commit: 676690b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.58%. Comparing base (76be7b0) to head (676690b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##             main    #6304     +/-   ##
=========================================
  Coverage   42.58%   42.58%             
=========================================
  Files        2497     2497             
  Lines       43372    43372             
  Branches    10231     9850    -381     
=========================================
  Hits        18470    18470             
- Misses      23576    24865   +1289     
+ Partials     1326       37   -1289     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive Claude Code plugin for debugging and fixing Playwright E2E test failures. The plugin provides automated analysis of test failures from CI, intelligent grouping of errors, and delegation to specialized subagents for investigation and fixes.

Changes:

  • Added a new /analyze-playwright-failures command that accepts PR links, GitHub Action workflow runs, or paths to Playwright results
  • Implemented automated download, unpacking, and parsing of Playwright test results with semantic error categorization
  • Created subagent architecture with exploration agents for codebase analysis and specialized fixer agents for test repairs

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
.gitignore Excludes plugin internal directories from version control
.claude/settings.json Configures the custom plugin marketplace and enables the analyze-playwright-failures plugin
.claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/scripts/prepare-report.sh Main script for downloading, extracting, merging, and parsing Playwright test reports
.claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/scripts/parse-failures.sh Parses JSON reports and categorizes failures semantically
.claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/scripts/get-error-details.sh Extracts detailed information for specific error categories
.claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/docs/trace-analysis.md Documentation for advanced trace analysis techniques
.claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/docs/environment-restore.md Documentation for restoring test environments and verifying data
.claude/plugins/analyze-playwright-failures/skills/analyze-playwright-failures/SKILL.md Main skill documentation defining the analysis workflow and agent delegation patterns
.claude/plugins/analyze-playwright-failures/scripts/block-trace-analysis.sh Hook script to prevent direct trace analysis and enforce delegation
.claude/plugins/analyze-playwright-failures/scripts/block-direct-edit.sh Hook script to prevent direct test file edits and enforce subagent usage
.claude/plugins/analyze-playwright-failures/hooks/hooks.json Hook configuration for enforcing workflow patterns
.claude/plugins/analyze-playwright-failures/agents/e2e-test-fixer.md Specialized subagent for fixing Playwright test failures
.claude/plugins/analyze-playwright-failures/.claude-plugin/plugin.json Plugin metadata configuration
.claude/plugins/.claude-plugin/marketplace.json Custom marketplace definition for local plugins

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

echo "Running: npx playwright merge-reports --reporter=json $BLOB_DIR"

# Try to run from current directory first (should have playwright installed)
if npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON" 2>/dev/null; then
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error output is being suppressed with 2>/dev/null, which makes debugging difficult when the merge fails. Consider capturing stderr to a variable or log file so diagnostic information is available when troubleshooting merge failures.

Copilot uses AI. Check for mistakes.
PROJECT_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || echo "")

if [ -n "$PROJECT_ROOT" ] && [ -f "$PROJECT_ROOT/package.json" ]; then
(cd "$PROJECT_ROOT" && npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON" 2>/dev/null) && {
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to line 43, stderr is being suppressed. This makes it harder to diagnose why the merge failed when the fallback to project root is attempted. Consider preserving error messages for troubleshooting.

Suggested change
(cd "$PROJECT_ROOT" && npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON" 2>/dev/null) && {
(cd "$PROJECT_ROOT" && npx playwright merge-reports --reporter=json "$BLOB_DIR" > "$OUTPUT_JSON") && {

Copilot uses AI. Check for mistakes.
Comment on lines +234 to +235
SUBDIR_COUNT=$(find "$EXTRACT_DIR" -mindepth 1 -maxdepth 1 -type d | wc -l | tr -d ' ')
if [ "$SUBDIR_COUNT" = "1" ]; then
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of wc -l | tr -d ' ' to get a clean count is fragile and platform-dependent. Consider using $(... | wc -l) without tr and comparing numerically, or use -print -quit with test for a more robust check.

Suggested change
SUBDIR_COUNT=$(find "$EXTRACT_DIR" -mindepth 1 -maxdepth 1 -type d | wc -l | tr -d ' ')
if [ "$SUBDIR_COUNT" = "1" ]; then
SUBDIR_COUNT=$(find "$EXTRACT_DIR" -mindepth 1 -maxdepth 1 -type d | wc -l)
if [ "$SUBDIR_COUNT" -eq 1 ]; then

Copilot uses AI. Check for mistakes.
echo "Merge failed. Looking for alternative..."

# Maybe there's already a usable JSON
REPORT_FILE=$(find "$BLOB_DIR" -name "*.json" -not -name "*.tmp" 2>/dev/null | head -1)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using find ... | head -1 is non-deterministic when multiple JSON files exist. The order depends on filesystem implementation. Consider adding -type f for clarity and sorting the results to ensure consistent behavior across different systems.

Suggested change
REPORT_FILE=$(find "$BLOB_DIR" -name "*.json" -not -name "*.tmp" 2>/dev/null | head -1)
REPORT_FILE=$(find "$BLOB_DIR" -type f -name "*.json" -not -name "*.tmp" 2>/dev/null | sort | head -n 1)

Copilot uses AI. Check for mistakes.
cat 0-trace.trace | jq -r 'select(.type == "before") | "\(.startTime) - \(.apiName) \(.params.selector // "")"' 2>/dev/null

# 2. Get screenshots taken 0-3 seconds AFTER a specific action
# Example: action was at timestamp 185821, get next few screenshots
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded timestamp value 185821 in the example should be clarified as a placeholder. Add a comment indicating this is an example value that should be replaced with actual timestamp from step 1, to avoid confusion.

Suggested change
# Example: action was at timestamp 185821, get next few screenshots
# Example: action was at timestamp 185821, get next few screenshots
# NOTE: 185821 is an example value; replace with the actual action timestamp from step 1

Copilot uses AI. Check for mistakes.
@witoszekdev witoszekdev added the skip changeset Use if your changes doesn't need entry in changelog label Jan 30, 2026
Copilot AI review requested due to automatic review settings February 3, 2026 10:19
@witoszekdev witoszekdev enabled auto-merge (squash) February 3, 2026 10:19
@witoszekdev witoszekdev merged commit bbfdd97 into main Feb 3, 2026
18 checks passed
@witoszekdev witoszekdev deleted the add-playwright-claude-plugin branch February 3, 2026 10:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 7 comments.

Comment on lines +366 to +383
.failures | group_by(.tests[0].results[0].category) |
map({
category: .[0].tests[0].results[0].category,
count: length,
domains: ([.[].domain] | unique),
rootCauses: ([.[].tests[0].results[0].rootCause | select(. != null)] | unique),
tests: [.[] | {
title: .title,
file: .file,
line: .line,
domain: .domain,
error: .tests[0].results[0].error.message,
errorFirstLine: (.tests[0].results[0].error.message | split("\n")[0] | .[0:150]),
rootCause: .tests[0].results[0].rootCause,
screenshot: ([.tests[0].results[0].attachments[] | select(.type == "screenshot") | .path] | first),
errorContext: ([.tests[0].results[0].attachments[] | select(.type == "error-context") | .path] | first)
}]
})
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential null pointer/array access issue. The jq expressions on lines 366-383 and 399-409 assume that failures have at least one test with at least one result. If a failure exists but has empty tests or results arrays, accessing .tests[0].results[0] will fail.

Consider adding null-safe access patterns or filtering out empty entries before grouping. For example:
.failures | map(select(.tests | length > 0) | select(.tests[0].results | length > 0)) | group_by(.tests[0].results[0].category)

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +164
.failures | group_by(.tests[0].results[0].category) |
map({
category: .[0].tests[0].results[0].category,
count: length,
domains: ([.[].domain] | unique),
rootCauses: ([.[].tests[0].results[0].rootCause | select(. != null)] | unique),
tests: [.[] | {
title: .title,
file: .file,
line: .line,
domain: .domain,
error: .tests[0].results[0].error.message,
errorFirstLine: (.tests[0].results[0].error.message | split("\n")[0] | .[0:150]),
rootCause: .tests[0].results[0].rootCause,
screenshot: ([.tests[0].results[0].attachments[] | select(.type == "screenshot") | .path] | first),
errorContext: ([.tests[0].results[0].attachments[] | select(.type == "error-context") | .path] | first)
}]
})
' "$OUTPUT_DIR/failures-full.json" > "$OUTPUT_DIR/failures-by-category.json"
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same potential null pointer/array access issue as in prepare-report.sh. The jq expressions on lines 146-164 and 178-184 assume that failures have at least one test with at least one result. If a failure exists but has empty tests or results arrays, accessing .tests[0].results[0] will fail.

Consider adding the same null-safe filtering as suggested for prepare-report.sh.

Copilot uses AI. Check for mistakes.
gh pr edit $PR_NUM --add-label "run pw-e2e"
```

Then stop and ask user to come back after test ends run.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical error in the phrase "test ends run". This should be either "test run ends" or "tests have finished running".

Suggested change
Then stop and ask user to come back after test ends run.
Then stop and ask user to come back after the test run ends.

Copilot uses AI. Check for mistakes.
"hooks": [
{
"type": "command",
"command": "bash -c 'if [[ \"$CLAUDE_FILE_PATH\" == *playwright/*.spec.ts* ]] || [[ \"$CLAUDE_FILE_PATH\" == *playwright/pages/*.ts* ]]; then echo \"🚫 BLOCKED: Use e2e-test-fixer subagent instead\" >&2; exit 2; fi'"
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential command injection vulnerability in the inline bash command. The pattern matching uses wildcards within [[ ]] which is correct, but the entire command is embedded directly in JSON without proper escaping. If environment variables like CLAUDE_FILE_PATH contain special characters or malicious input, this could be exploited.

Consider moving this logic to a separate script file (like block-direct-edit.sh) for better security and maintainability, then reference that script in the hook configuration instead of using inline bash.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,448 @@
#!/bin/bash
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shell scripts (prepare-report.sh, parse-failures.sh, get-error-details.sh, block-trace-analysis.sh, block-direct-edit.sh) are missing executable permissions. While the shebang line indicates they should be executable, the git repository should track these files with +x permissions to avoid confusion and ensure they work correctly when checked out.

Run: chmod +x .claude/plugins/dashboard-playwright/skills/analyze-failures/scripts/.sh .claude/plugins/dashboard-playwright/scripts/.sh

Copilot uses AI. Check for mistakes.
local BLOB_DIR="$1"
local OUTPUT_JSON="$2"

echo "Running: npx playwright merge-reports --reporter=json $BLOB_DIR"
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential command injection vulnerability. The BLOB_DIR variable is used directly in the command without proper quoting in the echo statement (line 40). While the variable is quoted in the actual command execution (line 43), the echo on line 40 should also quote the variable to prevent potential issues if BLOB_DIR contains spaces or special characters.

Change line 40 to: echo "Running: npx playwright merge-reports --reporter=json "$BLOB_DIR""

Suggested change
echo "Running: npx playwright merge-reports --reporter=json $BLOB_DIR"
echo "Running: npx playwright merge-reports --reporter=json \"$BLOB_DIR\""

Copilot uses AI. Check for mistakes.
*.zip)
echo "Step 1: Extracting zip file..."
EXTRACT_DIR="$OUTPUT_DIR/extracted"
mkdir -p "$EXTRACT_DIR"
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unzip call extracts a user-supplied report ZIP ($INPUT_PATH) directly into $EXTRACT_DIR without sanitizing archive paths, which can allow a crafted ZIP with ../ segments in filenames to perform path traversal and overwrite arbitrary files relative to the current working directory (classic "Zip Slip" issue). An attacker who can supply the ZIP (e.g., a malicious artifact or shared report) could plant or overwrite files outside playwright-failures/extracted, potentially leading to code execution or data loss on the developer machine. To mitigate, ensure extraction strips or rejects .. and absolute paths (e.g., by pre-filtering with zipinfo/unzip -Z, using a safe extraction helper that normalizes and validates paths, or using tooling that enforces confinement to a specific directory).

Suggested change
mkdir -p "$EXTRACT_DIR"
mkdir -p "$EXTRACT_DIR"
# Validate ZIP entries to prevent path traversal (Zip Slip)
while IFS= read -r zip_entry; do
# Disallow absolute paths and parent directory references
case "$zip_entry" in
/*|../*|*../*|*/..|..)
echo "Error: Unsafe path in zip entry '$zip_entry' (possible path traversal)." >&2
exit 1
;;
esac
done < <(unzip -Z1 "$INPUT_PATH")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip changeset Use if your changes doesn't need entry in changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants