Skip to content

Submission: Claude Sonnet 4.5 with extended(interleaved) thinking and trajectories#73

Open
Hrithik2212 wants to merge 1 commit intosierra-research:mainfrom
shivanibokadia-vl:aide-sonnet-4-5
Open

Submission: Claude Sonnet 4.5 with extended(interleaved) thinking and trajectories#73
Hrithik2212 wants to merge 1 commit intosierra-research:mainfrom
shivanibokadia-vl:aide-sonnet-4-5

Conversation

@Hrithik2212
Copy link

Hi,

This is a pull request for Claude Sonnet 4.5 without any changes to the base prompts and with full trajectories

@Hrithik2212 Hrithik2212 changed the title Submission: Claude Sonnet 4.5 with extended(interleaved-thinking) and trajectories Submission: Claude Sonnet 4.5 with extended(interleaved) thinking and trajectories Oct 31, 2025
@victorb-sierra
Copy link
Collaborator

Thank you! @benshi34 can you take a look at this?

@benshi34
Copy link
Collaborator

benshi34 commented Nov 4, 2025

Hi @Hrithik2212, thanks for submitting your trajectories! Quick question: Why are the telecom scores much lower than the reported scores? Are you affiliated with Anthropic?

@Hrithik2212
Copy link
Author

Hrithik2212 commented Nov 5, 2025

Hi @benshi34 ,

This run used the vanilla prompts, with the only change being the integration of interleaved thinking in the code. From what I understand, the Anthropic devs mentioned that they added prompt addendums to the telecom agent policy and the user prompt to avoid failure modes caused by the user ending incorrectly which would be the reason for higher scores on telecom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants