Skip to content

Conversation

@rubenmarcus
Copy link
Member

Summary

  • Replaces raw fetch calls to Anthropic API with the official @anthropic-ai/sdk
  • Enables prompt caching via cache_control: { type: "ephemeral" } on system messages
  • Cache reads are 90% cheaper than regular input tokens
  • Adds system message field to LLMRequest for all providers (Anthropic, OpenAI, OpenRouter)
  • Adds cache-aware pricing tiers to cost tracker (write: 1.25x, read: 0.1x)
  • Displays cache savings in CLI output and activity.md summaries

Impact

  • Up to 90% reduction in input token costs for repeated system prompts
  • Actual API usage metrics instead of text-based estimates (when using Anthropic)
  • Singleton client pattern for connection pooling

Files changed

  • src/llm/api.ts — Refactored to use Anthropic SDK, added system message + cache support
  • src/loop/cost-tracker.ts — Added cache pricing, CacheMetrics, recordIterationWithUsage()
  • package.json — Added @anthropic-ai/sdk dependency

Test plan

  • npm run build passes
  • npm run test:run — all 143 tests pass
  • Call tryCallLLM() with system message, verify cache metrics in response
  • Run ralph-starter run --track-cost and verify cache savings in cost summary

🤖 Generated with Claude Code

Replaces raw fetch calls to Anthropic API with the official @anthropic-ai/sdk,
enabling prompt caching via cache_control on system messages. Cache reads are
90% cheaper than regular input tokens. Adds cache-aware pricing to cost tracker
with savings metrics displayed in CLI output and activity summaries. Also adds
system message support and usage tracking for OpenAI/OpenRouter providers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

Warning

Rate limit exceeded

@rubenmarcus has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/prompt-caching

Comment @coderabbitai help to get the list of available commands and usage tips.

@rubenmarcus
Copy link
Member Author

Merged into staging/pre-conference branch for pre-release testing. Will be included in the upcoming stable release.

rubenmarcus added a commit that referenced this pull request Feb 10, 2026
Merge PR #139 - adds prompt caching via Anthropic SDK beta header
for reduced latency and cost on repeated API calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant