Skip to content

workflow example#4032

Closed
NathanFlurry wants to merge 2 commits into01-24-feat_frontend_traces_uifrom
01-24-queues_example
Closed

workflow example#4032
NathanFlurry wants to merge 2 commits into01-24-feat_frontend_traces_uifrom
01-24-queues_example

Conversation

@NathanFlurry
Copy link
Member

No description provided.

Copy link
Member Author

NathanFlurry commented Jan 24, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry changed the title queues example workflow example Jan 24, 2026
// Workflow Sandbox - Actor Registry
// Each actor demonstrates a different workflow feature using actor-per-workflow pattern

import { setup } from "rivetkit";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'setup' function is imported from 'rivetkit' but the module cannot be found. This is because the root package.json needs to have resolutions for RivetKit packages. Add 'rivetkit': 'workspace:*' to the resolutions in the root package.json.

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@claude
Copy link

claude bot commented Jan 24, 2026

Pull Request Review: Workflow Example

This PR adds a comprehensive workflow sandbox example and addresses several friction points in the RivetKit workflow API. Overall, this is a valuable contribution that improves developer experience and provides excellent documentation through a working example.


✅ Strengths

1. Excellent Documentation

  • workflow-friction.md is exceptional - documenting pain points during development is invaluable for improving the API
  • The example README follows the template correctly with clear feature sections
  • Good inline comments explaining workflow patterns

2. API Improvements

The PR addresses real friction points:

  • Added missing send() method to ActorQueue
  • Added missing broadcast() method to ActorWorkflowContext
  • Re-exported Loop from rivetkit/workflow to fix package resolution
  • Updated CLAUDE.md with RivetKit package resolution guidelines

3. Comprehensive Example

The workflow-sandbox example demonstrates:

  • Sequential steps (order processing)
  • Sleep/timers (countdown)
  • Loops with state (batch processing)
  • Listen with timeout (approval queue)
  • Join (parallel data fetching)
  • Race (work vs timeout)
  • Rollback/compensating transactions (payment)

Each pattern is implemented as a separate actor with clean separation of concerns.


⚠️ Issues & Recommendations

1. Type System Workaround (Critical)

Issue: The actorCtx() helper in _helpers.ts is a workaround for a fundamental type system problem documented in workflow-friction.md.

Location: examples/workflow-sandbox/src/actors/_helpers.ts:10-12

Problem: Every example needs to cast the loop context to access state and broadcast, which defeats type safety.

Recommendation:

  • Short-term: This workaround is acceptable for the example, but add a TODO comment linking to a tracking issue
  • Long-term: Fix the root cause by making ActorWorkflowContext the declared type for loop callbacks (as noted in friction log)

2. Type Cast in workflow/mod.ts

Issue: The return type is cast to satisfy RunConfig.

Recommendation: Consider:

  1. Adjusting the RunConfig type to accept the actual return type
  2. Ensuring the workflow run function signature matches what RunConfig expects
  3. Adding a comment explaining why the cast is necessary if it's intentional

3. Error Handling Patterns

Issue: All actor files use try-catch with error serialization, but there's no consistent error handling pattern.

Examples:

  • order.ts:85-96 - catches and sets state.error
  • payment.ts:158-170 - same pattern
  • No validation of error types or structured error objects

Recommendation:

  • Consider creating a shared error handling utility
  • Define error types instead of using Error | unknown
  • Follow the custom error system documented in CLAUDE.md (packages/common/error/)

4. Logging Style Inconsistency

Issue: Pick a consistent naming convention for IDs (orderId vs txId vs id)

5. Documentation Completeness

Issue: The workflow-friction.md document is excellent but lives at the repo root instead of in a docs/ directory.

Recommendation:

  • Move to docs/development/workflow-friction.md or similar
  • Add a link from the main README or CONTRIBUTING.md
  • Consider converting this into tracking issues for each friction point

🔍 Security Considerations

✅ No security concerns identified:

  • No sensitive data handling
  • No external API calls
  • Simulated payment/order processing only

📊 Test Coverage

Missing: No test files in tests/ directory for the workflow-sandbox example.

Recommendation: Add tests following the pattern from examples/CLAUDE.md


📝 Code Quality

Positive:

  • Consistent formatting
  • TypeScript strict mode enabled
  • Good separation of concerns
  • Clear naming conventions

Areas for Improvement:

  • Reduce duplication in actor patterns (consider shared utilities)
  • Add JSDoc comments for exported types

📋 Checklist

  • ✅ Follows repository conventions (CLAUDE.md)
  • ✅ Uses pnpm workspace patterns correctly
  • ✅ ESM imports with .ts extensions
  • ✅ Structured logging with tracing
  • ⚠️ Type safety (workarounds present)
  • ❌ Missing test coverage
  • ⚠️ Friction points documented but not fully resolved

🎬 Conclusion

Recommendation: Approve with minor changes

This PR significantly improves the RivetKit developer experience by:

  1. Adding missing API methods that were clearly needed
  2. Providing a comprehensive workflow example
  3. Documenting friction points for future improvement

The type system workarounds are acceptable for now, given they're well-documented in the friction log. However, these should be addressed in follow-up work.

Required before merge:

  • None (approval can proceed)

Recommended follow-ups:

  1. Create tracking issues for each friction point in workflow-friction.md
  2. Add test coverage for the workflow-sandbox example
  3. Fix the type system issues so actorCtx helper isn't needed
  4. Review and potentially refactor the 880-line App.tsx component

Great work on documenting the friction points - this kind of developer feedback is invaluable! 🎉

Comment on lines +44 to +50
import { timer } from "./actors/timer.ts";
import { order } from "./actors/order.ts";
import { batch } from "./actors/batch.ts";
import { approval } from "./actors/approval.ts";
import { dashboard } from "./actors/dashboard.ts";
import { race } from "./actors/race.ts";
import { payment } from "./actors/payment.ts";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports need to be sorted alphabetically to pass Biome linting. Run 'pnpm biome check --apply' to automatically fix this issue.

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +134 to +136
useEffect(() => {
actor.connection?.getOrder().then(setOrder);
}, [actor.connection]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates an unhandled floating Promise. Add a .catch() handler to properly handle potential errors: actor.connection?.getOrder().then(setOrder).catch(error => console.error('Failed to get order:', error));

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +714 to +721
useEffect(() => {
actor.connection?.getTask().then(setTask);
}, [actor.connection]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling to this Promise: actor.connection?.getTask().then(setTask).catch(error => console.error('Failed to get task:', error));

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +527 to +532
<button className="approve" onClick={() => onApprove("Admin")}>
Approve
</button>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If onApprove returns a Promise, handle it properly: onClick={() => { const result = onApprove('Admin'); if (result instanceof Promise) result.catch(err => console.error('Approval failed:', err)); }}

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

This was referenced Jan 26, 2026
@NathanFlurry NathanFlurry changed the base branch from 01-21-feat_rivetkit_integrate_workflows_in_to_actors to graphite-base/4032 January 28, 2026 01:12
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4032 to 01-24-feat_frontend_traces_ui January 28, 2026 01:12
@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: Workflow Example

This PR adds comprehensive workflow sandbox examples and makes important improvements to the workflow engine. Overall, the changes are well-structured and demonstrate proper use of the workflow system. Here are my findings:

✅ Strengths

  1. Excellent Documentation: The new FLUSHING.md doc is extremely valuable - it clearly explains the critical flush behavior and common pitfalls. This will help future developers avoid subtle bugs.

  2. Comprehensive Examples: The workflow-sandbox and queue-sandbox examples cover a wide range of use cases (timers, approvals, batching, race conditions, payments) which will be great learning resources.

  3. Proper Vercel Parity: Both examples have Vercel equivalents generated, maintaining consistency between local and deployed environments.

  4. Logging Integration: Good addition of logger support in workflow engine (options.logger) for better observability.

  5. Test Coverage: The workflow tests have been extended with a 5-second timeout check.

🐛 Potential Issues

1. Indentation Inconsistency in index.ts (Line 878-879)

Lines 878-879 in rivetkit-typescript/packages/workflow-engine/src/index.ts have extra indentation:

// Unrecoverable error
storage.error = extractErrorInfo(error);
	storage.state = "rolling_back";  // Extra tab here
	await flush(storage, driver);     // Extra tab here

This should be:

storage.error = extractErrorInfo(error);
storage.state = "rolling_back";
await flush(storage, driver);

2. Error Handling in executeWithTimeout (Line 487-502)

The executeWithTimeout method correctly notes that it cannot cancel the underlying operation, but there's a subtle issue: if the timeout fires, the promise continues running in the background. This could lead to:

  • Duplicate side effects if the step is retried
  • Resource leaks
  • Confusing logs

Recommendation: Document this more prominently in the StepConfig type definition, not just in an internal method comment. Users need to understand that timeouts don't cancel work.

3. Rollback Handler Registration During Rollback Mode

In executeStepRollback (line 503-534), the code calls registerRollbackAction which could theoretically register new rollback actions during rollback replay. However, looking at the logic:

  • this.rollbackActions is only used in "rollback" mode
  • But new actions shouldn't be registered during rollback replay

The code seems correct (it only registers if metadata.rollbackCompletedAt === undefined), but this is subtle. Consider adding a comment explaining why registration during rollback mode is safe/intentional.

4. Race Condition in executeLiveWorkflow

Lines 516-534 in index.ts use Promise.race([messagePromise, sleepPromise]) when both messages and deadline exist. If the sleep promise wins but a message arrives microseconds later, the workflow will wake, process, and potentially re-sleep. This is probably fine (just inefficient), but worth documenting that this is expected behavior.

5. Loop Iteration Cleanup Edge Case

In forgetOldIterations (line 754-786), the code validates that loop location ends with a number. However, this assumes the location structure is always correct. If there's a bug elsewhere that corrupts the location, this will throw a generic Error instead of a more specific HistoryDivergedError. Consider using a more specific error type.

📝 Code Quality Observations

1. File Organization

The workflow examples follow good structure with clear separation of concerns (actors, server, frontend). However, some actors (like dashboard.ts) are quite large (200+ lines). Consider splitting complex actors into smaller helpers.

2. Error Types Consistency

The error classes in errors.ts are well-designed, but there's inconsistency in whether errors include additional metadata:

  • StepExhaustedError includes lastError
  • StepFailedError includes originalError and attempts
  • RaceError includes errors array
  • CancelledError has no metadata

This is fine, but consider documenting the philosophy: "Include metadata when it helps debugging, omit when error message is self-contained."

3. Magic Numbers

Several constants are used directly in code:

  • Line 423 in index.ts: Date.now() + 100 - this retry delay should be a named constant
  • Line 98 in driver.ts: workerPollInterval = 100 - good, this IS a named constant

4. Type Safety in executeListenNUntilImpl

The listenNWithTimeout method (line 1216-1236) calls executeListenNUntilImpl but doesn't validate that limit is positive. Consider adding validation:

if (limit <= 0) {
  throw new Error("limit must be positive");
}

🔒 Security Considerations

  1. Message Data Serialization: The workflow engine serializes arbitrary user data for messages. Ensure the underlying KV storage properly handles untrusted input (SQL injection, XSS in keys, etc.). This appears to be handled correctly through the binary key/value interface.

  2. Rollback Handlers: Rollback handlers receive user-controlled output values. The docs should warn that rollback handlers must validate their inputs if they perform sensitive operations.

⚡ Performance Considerations

  1. Flush Frequency: The code properly batches writes, but every non-ephemeral step triggers a flush. For workflows with hundreds of steps, this could be slow. The ephemeral flag helps, but consider documenting benchmarks or guidelines for when to use it.

  2. Loop History: The loop iteration cleanup (lines 754-786) is well-designed, but the defaults (DEFAULT_LOOP_HISTORY_KEEP = 20) might not suit all use cases. Document the storage/performance tradeoffs.

  3. Race Cleanup: Lines 1796-1811 delete non-winning branch entries. This is good for storage efficiency, but for long-running races with many branches, this cleanup could be expensive. Consider making it async/deferred.

📚 Documentation Needs

  1. FLUSHING.md is excellent but should be referenced from the main workflow README
  2. Rollback behavior needs more examples - when to use checkpoints, how to write safe rollback handlers
  3. Timeout behavior needs to be documented at the API level, not just in code comments
  4. Loop state management could use examples of when to use commitInterval vs historyEvery

🧪 Testing Observations

The test addition in actor-workflow.ts (5-second timeout) is minimal. Consider adding tests for:

  • Workflow rollback scenarios
  • Race condition edge cases (all branches fail, all branches yield)
  • Loop iteration cleanup
  • Message arrival during sleep

✨ Suggested Improvements

  1. Fix the indentation issue in index.ts line 878-879
  2. Add validation for limit parameter in listen methods
  3. Extract magic number for retry delay (line 423)
  4. Add prominent documentation about timeout non-cancellation
  5. Consider adding debug mode that logs all flush operations to help diagnose persistence issues

📋 Minor Nitpicks

  1. Line 62 in context.ts: DEFAULT_LOOP_COMMIT_INTERVAL is exported but the feature isn't fully implemented (see comment line 652)
  2. The liveRuntime structure could be a class instead of an object with functions
  3. Some log messages use string interpolation while others use structured logging - pick one style

Summary

This is solid work that adds valuable functionality and documentation. The workflow engine changes demonstrate good understanding of distributed systems challenges (persistence, replay, partial failure). The examples are comprehensive and will help users learn the system.

Primary concerns:

  1. Fix indentation bug (must fix)
  2. Document timeout non-cancellation at API level (should fix)
  3. Add input validation for limit parameters (nice to have)
  4. Expand test coverage (nice to have)

The code is production-ready after addressing the indentation issue. Great job overall! 🎉

@jog1t jog1t force-pushed the 01-24-feat_frontend_traces_ui branch from 389f28e to 3c7fabc Compare January 28, 2026 19:55
@NathanFlurry NathanFlurry force-pushed the 01-24-feat_frontend_traces_ui branch from 3c7fabc to 7dc3fdd Compare January 30, 2026 08:19
@jog1t jog1t force-pushed the 01-24-feat_frontend_traces_ui branch from 7dc3fdd to 0232d12 Compare January 30, 2026 21:29
NathanFlurry and others added 2 commits February 3, 2026 12:01
…WithTimeout

When listenWithTimeout/listenUntil was used, workflows would only wake
when the deadline passed, not when a message arrived. This broke
approval workflows and other human-in-the-loop patterns.

Changes:
- Extended SleepError to optionally carry messageNames array
- Modified listen methods to pass message names when throwing SleepError
- Updated execution loop to wait for EITHER messages OR deadline
- Fixed race branches to properly yield on SleepError instead of failing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@graphite-app
Copy link
Contributor

graphite-app bot commented Feb 4, 2026

Merge activity

  • Feb 4, 8:36 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Feb 4, 8:37 PM UTC: CI is running for this pull request on a draft pull request (#4114) due to your merge queue CI optimization settings.
  • Feb 4, 8:38 PM UTC: Merged by the Graphite merge queue via draft PR: #4114.

graphite-app bot pushed a commit that referenced this pull request Feb 4, 2026
@graphite-app graphite-app bot closed this Feb 4, 2026
@graphite-app graphite-app bot deleted the 01-24-queues_example branch February 4, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant