Skip to content

Comments

refactor: claude-mentions#830

Merged
dknopik merged 5 commits intosigp:stablefrom
shane-moore:claude-mentions-updates
Feb 19, 2026
Merged

refactor: claude-mentions#830
dknopik merged 5 commits intosigp:stablefrom
shane-moore:claude-mentions-updates

Conversation

@shane-moore
Copy link
Member

@shane-moore shane-moore commented Feb 12, 2026

Issue Addressed

N/A — Improvement based on team discussion.

Proposed Changes

Following our team call discussion about the utility of Claude GitHub Actions, I have some ideas:

Additional Info

Anyone who wants a Claude review on a PR can still get one by commenting @claude review this. This gives us the same capability without the CI gate or the noise of auto-reviewing every PR.

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 12, 2026

Claude finished @shane-moore's task in 1m 20s —— View job


Code Review Complete

Tasks

  • Read and analyze the modified claude-mentions.yml workflow
  • Review the deleted claude-pr-review.yml changes via git diff
  • Check for any remaining references to the deleted workflow
  • Evaluate prompt improvements against stated objectives
  • Provide comprehensive code review feedback

Review Summary

This PR successfully achieves its stated goals of removing automated Claude CI checks while improving on-demand Claude reviews. The changes are clean, well-motivated, and the new prompt structure is a significant improvement.

Detailed Findings

1. Workflow Deletion — Clean Removal ✅

The removal of claude-pr-review.yml is complete and leaves no dangling references:

  • No references to claude-pr-review found in remaining workflow files
  • No references to claude-recheck label found (used exclusively by the deleted workflow)
  • No references to no-claude-review label found in documentation

Note: Existing PRs may still have claude-recheck or no-claude-review labels, which will now be inert. Consider a one-time cleanup or letting them age out naturally.

2. Prompt Improvements — Well Structured ✅

The new prompt in claude-mentions.yml (lines 85-111) addresses the stated goals effectively:

Strengths:

  • Clear priority ordering (Security → Correctness → Performance → Error Handling → Rust Idioms → Design → Testing)
  • Explicit "Do NOT comment on" rules reduce noise from style nits and pre-existing issues
  • Rust-specific focus areas (ownership patterns, unnecessary clones, trait bounds)
  • Concise directive: "No praise. Focus on actionable issues only."
  • Scope limitation: "Review ONLY the PR diff. Do not flag pre-existing issues."

Minor Observations:

  • The prompt is embedded in the workflow rather than referenced from a separate file. This is fine for maintainability given its size, but consider externalizing if it grows significantly.
  • The instruction "If no significant issues are found, say so briefly and move on" is helpful for avoiding empty/verbose reviews.

3. Consistency — Comment Update ✅

Line 3 changed from:

# Scenario 1: @claude mentions on issues and PR comments

to:

# @claude mentions on issues and PR comments

This is appropriate since there's no longer a "Scenario 2" (the deleted automated review workflow).

4. Security & Correctness — No Issues ✅

  • Permissions remain appropriately scoped
  • Fork support logic unchanged (still handles fork PRs correctly)
  • GitHub App token generation unchanged
  • AWS OIDC configuration unchanged
  • Organization membership check unchanged

5. Potential Considerations

Documentation: Consider updating CONTRIBUTING.md or other docs if they reference:

  • The claude-recheck label workflow
  • Automated Claude reviews on PR open/update
  • The no-claude-review label

No such references were found in this scan, but worth verifying in user-facing docs.

Label Cleanup: The claude-recheck and no-claude-review labels may exist in the repository's label configuration and on existing PRs. These are now unused and could be:

  • Removed from the label configuration
  • Documented as deprecated
  • Left as-is (they're harmless)

Conclusion

The refactoring is sound. The removal of automated CI checks addresses the brittleness issues (#827, #828, #829), and the improved prompt structure should reduce noise while maintaining review value through on-demand @claude mentions.

No code changes required.


Use a single top-level comment for summary only if there are findings.
claude_args: |
--max-turns 50
--model us.anthropic.claude-sonnet-4-5-20250929-v1:0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to upgrade the model used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Opus 4.6 is waaay better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, updated in f07886c

@diegomrsantos
Copy link
Member

What's the problem with current claude review? I believe it adds a lot of value.

The problem wasn't caused by the claude review. The problem was that claude action wasn't configured with a fixed version.

@diegomrsantos
Copy link
Member

@claude wdy5t about #830 (comment)

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 13, 2026

Claude finished @diegomrsantos's task in 1m 42s —— View job


Analyzing the automated review debate

Tasks


Response to @diegomrsantos's Question

You raise a valid point about version pinning. Let me address both perspectives:

The Version Pinning Argument

You're correct that the immediate technical issues (#827, #828) were about workflow configuration, not Claude's review quality:

Both workflows currently use anthropics/claude-code-action@v1, which is a floating tag that can change. This was indeed a factor in the brittleness.

Diego's suggestion: Pin to a specific version (e.g., @v1.2.3) to avoid unexpected behavior changes.

The Noise vs. Value Trade-off

However, there's a broader question about whether automated reviews on every PR open/update provide net value:

Arguments for keeping (with version pinning):

  • ✅ Catches issues proactively without requiring manual @claude invocation
  • ✅ Provides consistent review coverage across all PRs
  • ✅ Review value is preserved (as you noted)

Arguments for removing (this PR's approach):

  • ✅ Eliminates CI gate that can block merges on infrastructure issues
  • ✅ Reduces notification noise from automated reviews on every PR update
  • ✅ Maintains same capability via @claude review this (on-demand)
  • ✅ Allows developers to choose when Claude review adds value vs. when it's redundant with human review

The Middle Ground: Pin + Optional Trigger

A compromise approach not explored in this PR:

  1. Pin the version: anthropics/claude-code-action@v1.2.3 (or specific SHA)
  2. Make it opt-in: Use a label trigger (e.g., claude-review) instead of automatic on PR open
  3. Keep it as a check, not a required gate: Set continue-on-error: true or don't mark it as a required check

This would:

  • Eliminate the floating version risk (Diego's concern)
  • Reduce noise by making it opt-in (Shane's concern)
  • Preserve the CI integration for teams that want it

Recommendation

Both approaches are valid depending on team preference:

If the team values proactive automated reviews: Merge a variant of this PR that:

  • Restores claude-pr-review.yml with pinned version (@v1.2.3 or SHA)
  • Makes it non-blocking (not a required check)
  • Optionally: make it label-triggered instead of auto-triggered

If the team prefers on-demand reviews: Keep this PR as-is and rely on @claude review this mentions

The real question isn't technical (version pinning solves that) — it's workflow preference: automated coverage vs. on-demand control.

Related Question from @dknopik

Regarding model upgrade (line 114 comment): The current workflow uses us.anthropic.claude-sonnet-4-5-20250929-v1:0 (Sonnet 4.5 via Bedrock). Upgrading to Opus 4.6 would provide better analysis but higher cost. This is orthogonal to the automated-vs-on-demand debate.


@diegomrsantos
Copy link
Member

@claude I disagree with your arguments.

Eliminates CI gate that can block merges on infrastructure issues

This can happen with any other task, and we won't remove all of them.

Reduces notification noise from automated reviews on every PR update

The automatic review happens only once. More noise can be produced with on demand. People who aren't interested in the reviews can ignore if they believe it's noise.

Maintains same capability via @claude review this (on-demand)

I believe having bigger reviews when it adds value is more common than the opposite.

What could be interesting is if there's a way to configure the automatic review to be less noisy. Maybe adding only comments about issues it found. @shane-moore @dknopik

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 13, 2026

Claude finished @diegomrsantos's task in 1m 7s —— View job


Responding to Discussion Points

Tasks

  • Read PR changes to understand current state
  • Address each counterargument
  • Explore the "less noisy" suggestion
  • Provide balanced perspective

Response to @diegomrsantos

You raise valid points that deserve a thoughtful response. Let me address each one:

Your Counterarguments

1. "Any CI task can block merges, we don't remove all of them"

Fair point. The difference is that most CI tasks (tests, lints, builds) are deterministic and under the team's control. The brittleness in #827/#828 came from:

  • Using a floating version tag (@v1) that can change behavior unexpectedly
  • Dependency on external GitHub App infrastructure
  • Complex fork/token handling logic

However, you're right that pinning the version (as you suggested earlier) would address most of this. The real question becomes: is automated coverage worth the occasional infrastructure friction?

2. "Automatic review happens only once, not noisy"

This is actually not quite accurate based on the deleted workflow. Looking at the trigger conditions:

on:
    pull_request_target:
        types: [opened, ready_for_review, reopened, labeled, unlabeled]

The automated review would trigger on:

  • PR opened
  • Ready for review (after draft)
  • Reopened
  • Any label changes (with claude-recheck or no-claude-review)

So it could run multiple times per PR depending on workflow. But you're right that it's not triggered on every push like some CI checks.

The "noise" concern is more about getting automated reviews when they may not add value (e.g., on trivial PRs, or when human reviewers have already caught the issues).

3. "Bigger reviews when they add value is more common"

This is a good empirical question: How often do automated Claude reviews find significant issues vs. provide low-signal feedback?

I don't have data on this for Anchor specifically, but if the team's experience is that automated reviews frequently catch real issues, that's a strong argument for keeping them.

Your "Less Noisy" Suggestion

This is actually a great middle ground:

"What could be interesting is if there's a way to configure the automatic review to be less noisy. Maybe adding only comments about issues it found."

This is technically feasible. The prompt in this PR already includes:

If no significant issues are found, say so briefly and move on.

But you could strengthen this to:

If no significant issues are found, DO NOT post a review comment at all.

This would require the workflow to check Claude's output and only post if issues were found. This could be implemented with a post-processing step that:

  1. Parses Claude's response
  2. Only posts the comment if substantive issues were identified
  3. Silently exits otherwise

Recommendation

Given your feedback, here are three paths forward:

Option A: Keep automated reviews with improvements (addresses your concerns)

  • Revert the deletion of claude-pr-review.yml
  • Pin to specific version (e.g., @v1.2.3 or SHA)
  • Add post-processing to only comment when issues found
  • Keep the improved prompt from this PR

Option B: Hybrid approach

  • Keep automated reviews for larger PRs (based on lines changed threshold)
  • Skip trivial PRs (< 50 lines changed)
  • Pin version and improve prompt
  • Only comment when issues found

Option C: Proceed with this PR (on-demand only)

  • Accept that some value may be lost
  • Rely on developers to use @claude review this when they want feedback
  • Reduce infrastructure maintenance burden

My Assessment

Your points are well-taken. The automated review does provide value, and the brittleness issue can be solved with version pinning. The "only comment when issues found" suggestion is a smart way to reduce noise while maintaining coverage.

The real trade-off is: automated coverage with occasional friction vs. manual opt-in with zero friction.

This is ultimately a team decision based on:

  • How often automated reviews catch real issues
  • Team tolerance for CI infrastructure complexity
  • Whether developers remember to use @claude review this when it would help

What do @shane-moore and @dknopik think about Option A (keep automated, but pin version + only comment when issues found)?


@shane-moore shane-moore force-pushed the claude-mentions-updates branch 3 times, most recently from 4328502 to 737cb4c Compare February 16, 2026 19:23
@shane-moore
Copy link
Member Author

Maintains same capability via @claude review this (on-demand)

I believe having bigger reviews when it adds value is more common than the opposite.

What could be interesting is if there's a way to configure the automatic review to be less noisy. Maybe adding only comments about issues it found. @shane-moore @dknopik

@diegomrsantos, i like the idea of having the automatic review only commenting when it finds an actual issue. And we use the claude mentions tool separately as to investigate something more deeply.
I updated the pr to fix the version to claude-code-action@v1.0.52, which is from yesterday

feel free to lmk what you think about the prompts for claude-mentions and claude-pr-review. hoping explicitly including in the claude-pr-review to only leave comment if it finds a potential issue will help. If it finds no issues, it should just reply with "No actionable issues found." now 🤞

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 16, 2026

Claude finished @shane-moore's task in 1m 16s —— View job


Reviewing Updated Workflow Prompts

Tasks

  • Read current state of claude-mentions.yml
  • Read current state of claude-pr-review.yml
  • Evaluate prompt effectiveness and "only comment on issues" implementation
  • Provide comprehensive feedback

Review Summary

Excellent updates! The prompts are now well-structured and the "silent watchdog" approach is properly implemented. Both workflows now use Opus 4.6 and have version pinning in place.


Detailed Feedback

✅ Version Pinning (claude-code-action@v1.0.52)

Both workflows correctly pin to v1.0.52 (released 2026-02-15), addressing the brittleness concerns from #827/#828. This prevents unexpected behavior changes from floating version tags.

Recommendation: Consider using a SHA pin instead of a semantic version tag for maximum stability:

uses: anthropics/claude-code-action@<commit-sha>  # e.g., @abc123def

But @v1.0.52 is a significant improvement over @v1 and is acceptable.


✅ Model Upgrade to Opus 4.6

Both workflows now use:

--model us.anthropic.claude-opus-4-6-v1

This addresses the review comment from @dknopik and @diegomrsantos. Opus 4.6 provides significantly better analysis quality than Sonnet 4.5.


claude-pr-review.yml — "Silent Watchdog" Implementation

Strengths:

  1. Clear directive to remain silent when no issues found (lines 106-109):

    CRITICAL RULE: If you find NO actionable issues, do NOT post any
    PR comments or inline comments. Simply respond with:
    "No actionable issues found."
    and stop. Do not elaborate, summarize, or praise.
    

    This is explicit and unambiguous. The instruction is prominent and uses strong language ("CRITICAL RULE").

  2. Focused scope (lines 111-119):
    The prompt prioritizes real problems (security, correctness, performance, error handling) and explicitly excludes style/formatting/documentation noise.

  3. Concise summary format (lines 129-132):
    When issues ARE found, the prompt asks for inline comments + brief summary without preamble or praise.

  4. Appropriate tool restrictions (line 136):

    --allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Read,Grep"

    This gives Claude the tools to post comments when needed, but the prompt instructs it not to use them unless issues are found.

Potential Issue:

The "silent watchdog" behavior relies on Claude following the instruction not to post comments. There's no technical enforcement to prevent comment posting if Claude doesn't follow the prompt correctly.

How it works:

  • If Claude finds no issues → responds "No actionable issues found." → no comment posted ✅
  • If Claude finds issues → uses mcp__github_inline_comment__create_inline_comment or gh pr comment → comments posted ✅
  • If Claude misunderstands → might post a comment anyway ⚠️

This is acceptable since:

  1. Opus 4.6 has strong instruction-following capabilities
  2. The prompt is very explicit
  3. Occasional false-positive comments are better than false-negative silence on real issues

Alternative (if noise becomes a problem):
Add a post-processing step that parses Claude's output and only posts comments if issues were identified. But this adds complexity and may not be necessary.


claude-mentions.yml — On-Demand Review Prompt

Strengths:

  1. Clear prioritization (lines 92-100):
    The 7-tier priority order (Security → Correctness → ... → Testing) guides Claude to focus on important issues first.

  2. Explicit exclusions (lines 102-108):
    The "Do NOT comment on" list reduces noise from style/formatting suggestions.

  3. Concise directive (lines 88-90):

    Be precise and concise. No praise. Focus on actionable issues only.
    Review ONLY the PR diff. Do not flag pre-existing issues.
    If no significant issues are found, say so briefly and move on.
    

    This is appropriate for on-demand reviews where the user explicitly requested feedback.

Key Difference from claude-pr-review.yml:

  • claude-mentions.yml: "If no significant issues are found, say so briefly and move on."
  • claude-pr-review.yml: "If you find NO actionable issues, do NOT post any PR comments."

This distinction is correct:

  • On-demand reviews (claude-mentions) should always respond (user asked for review)
  • Automated reviews (claude-pr-review) should be silent unless issues found

Suggestions for Further Improvement

1. Add concurrency control to claude-mentions.yml

Currently, claude-pr-review.yml has:

concurrency:
    group: claude-review-${{ github.event.pull_request.number }}
    cancel-in-progress: true

But claude-mentions.yml doesn't. If multiple users spam @claude mentions, multiple concurrent runs could occur.

Recommendation:

concurrency:
    group: claude-mention-${{ github.event.issue.number || github.event.pull_request.number }}
    cancel-in-progress: true

2. Consider reducing claude-pr-review triggers

Currently triggers on:

types: [opened, ready_for_review, reopened, labeled, unlabeled]

The labeled/unlabeled triggers are only useful for claude-recheck and no-claude-review labels. If label-triggered reviews are uncommon, consider removing those triggers to reduce unnecessary workflow runs.

Alternatively, keep as-is since the label check (line 23-26) filters appropriately.

3. Document the no-claude-review and claude-recheck labels

These labels are used by the workflow but may not be documented for contributors. Consider adding to CONTRIBUTING.md:

## Automated Code Review

Claude automatically reviews PRs on open/ready_for_review/reopened. To control this:

- Add `no-claude-review` label to skip automated review on a PR
- Add `claude-recheck` label to manually trigger a review

4. Minor: Align prompt structure between workflows

Both prompts are similar but have slight wording differences. Consider extracting the shared priority list and exclusion list into a shared template file to reduce duplication and ensure consistency.

Example:

prompt: |
    ${{ env.CLAUDE_REVIEW_PRIORITIES }}
    
    CRITICAL RULE: If you find NO actionable issues, do NOT post any comments.
    ...

But this is a minor optimization and not necessary for functionality.


Overall Assessment

The updated workflows successfully address the concerns raised in the discussion:

✅ Version pinning prevents brittleness
✅ Automated review is "silent watchdog" that only comments on issues
✅ On-demand review via @claude provides deep investigation capability
✅ Opus 4.6 upgrade improves review quality
✅ Prompt prioritization reduces noise

The "only comment when issues found" approach is well-implemented through clear prompt instructions. While there's no technical enforcement (Claude could still post if it misunderstands), the prompt is explicit enough that Opus 4.6 should follow it reliably.

Minor suggestions (concurrency control, label documentation) are nice-to-haves but not blockers.


Conclusion

The prompts are well-designed and the "silent watchdog" implementation is solid. The combination of version pinning, model upgrade, and focused prompts should provide high-value automated reviews without the noise or brittleness of the previous approach.

No changes required. Approved! 🎯


@shane-moore shane-moore force-pushed the claude-mentions-updates branch from 737cb4c to 01ce65e Compare February 17, 2026 20:11
@shane-moore shane-moore changed the base branch from unstable to stable February 17, 2026 20:12
@shane-moore shane-moore force-pushed the claude-mentions-updates branch from 01ce65e to 26e578c Compare February 17, 2026 20:14
@shane-moore
Copy link
Member Author

@claude

regarding the CI failures:

@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 17, 2026

Claude finished @shane-moore's task in 1m 7s —— View job


Acknowledging CI Context

Tasks

  • Review current workflow state
  • Check CI status
  • Acknowledge merge strategy context
  • Provide feedback on approach

Understanding the CI Failures

Thanks for the context on the CI failures. Let me confirm what I'm seeing:

Current CI Status:

  1. target-branch-check failure ✅ Expected

    • This PR targets stable branch directly (not unstable)
    • The check is designed to prevent accidental merges to stable
    • Your reasoning: You don't want to merge everything from unstablestable yet, just this workflow refactor
    • Conclusion: This failure is intentional and requires manual bypass
  2. check-code failure ✅ Expected

Workflow Review

Both workflow files look good in their current state:

claude-mentions.yml (On-demand reviews)

  • Version pinned to @v1.0.52
  • Uses Opus 4.6 model
  • Clear prompt prioritizing Security → Correctness → Performance
  • Appropriate for user-initiated reviews

claude-pr-review.yml (Silent watchdog)

  • Version pinned to @v1.0.52
  • Uses Opus 4.6 model
  • "CRITICAL RULE" directive to only comment when issues found
  • Concurrency control to prevent duplicate runs
  • Proper fork support and token handling

Merge Strategy Context

Your approach makes sense:

Why bypass is acceptable:

  • This PR contains workflow improvements that benefit stable immediately
  • Waiting for a full unstablestable merge would delay these improvements
  • The workflow changes are isolated and don't depend on other unstable code
  • The cargo audit failure is already addressed in fix: update bytes to 1.11.1 to fix RUSTSEC-2026-0007 #797 and will sync later

Recommended next steps:

  1. Add a note in the PR description explaining the stable merge strategy (for future reference)
  2. Get review approval from team members
  3. Merge with admin override for the target-branch-check (or configure a bypass)
  4. The check-code failure can be ignored since fix: update bytes to 1.11.1 to fix RUSTSEC-2026-0007 #797 will fix it when merged

Overall Assessment

The workflow refactor successfully achieved its goals:

  • ✅ Automated review restored as "silent watchdog"
  • ✅ Version pinning addresses brittleness
  • ✅ Model upgrade to Opus 4.6
  • ✅ Improved prompts reduce noise
  • ✅ On-demand reviews via @claude preserved

The CI failures are understood and acceptable given your merge strategy. No changes needed to the workflows themselves.


@shane-moore
Copy link
Member Author

shane-moore commented Feb 17, 2026

@dknopik @diegomrsantos, realized these changes need to be merged to stable since stable is the default branch and both pull_request_target and issue_comment triggers read workflow files from there. so I updated pr to point at stable and cherry picked over recent commits related to the workflow files

ran this flow locally w/o issue to prove to myself we wouldn't see merge issues later:

  1. merge this PR → stable: fast-forward (no merge needed, stable
    was behind)
  2. merge stable → unstable: auto-merged cleanly (Git
    resolved overlapping changes automatically since the
    diffs were compatible)
  3. merge unstable → stable: fast-forward again

also, i commented on the CI failures here: #830 (comment)

@dknopik dknopik merged commit d8e5c6c into sigp:stable Feb 19, 2026
13 of 15 checks passed
Comment on lines +106 to +119
CRITICAL RULE: If you find NO actionable issues, do NOT post any
PR comments or inline comments. Simply respond with:
"No actionable issues found."
and stop. Do not elaborate, summarize, or praise.

Only comment on issues you are CONFIDENT are real problems:

1. **Security** — vulnerabilities, unsafe code, input validation, auth logic
2. **Correctness** — logic errors, race conditions, edge cases, off-by-one errors
3. **Performance** — bottlenecks, unnecessary allocations, resource leaks
4. **Error Handling** — missing error paths, unwrap/expect in non-test code, silent failures
5. **Rust Idioms** — non-idiomatic patterns, unnecessary clones, misuse of ownership/borrowing
6. **Design** — incorrect abstractions, module boundary violations, missing trait bounds
7. **Testing** — missing coverage for new code paths, untested edge cases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this duplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I made the PR, I wasn't sure if we wanted the claude-mentions and CI review to have the same instructions, since the previous claude-mentions didn't have any guidelines previously. and I wasn't sure if during PR review, folks might want different instructions for claude-mentions vs CI

seems like we're fine with same guidelines between the bots. wdyt of something like a shared .github/claude-review-instructions.md that contains this shared prompt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants