Fix race condition causing bookmark queue items to accumulate during rapid succession resumes by Copilot · Pull Request #7225 · elsa-workflows/elsa-core

Copilot · 2026-01-30T16:25:00Z

When multiple /tasks/{taskid}/complete requests arrive in rapid succession, duplicate queue items are created for the same bookmark. The original implementation only deleted queue items when responses.Count > 0, causing the second queue item to remain stuck after the bookmark was already processed.

Changes

Modified BookmarkQueueProcessor.ProcessItemAsync()

Move queue item deletion to finally block—always executes regardless of bookmark presence or exceptions
Use CancellationToken.None for deletion to ensure cleanup during shutdown
Add explicit exception logging for workflow resumption failures

Modified BookmarkQueueProcessor.ProcessPageAsync()

Add try-catch wrapper to prevent one failing item from blocking batch processing

Before/After

// Before: queue item only deleted if bookmark found
var responses = await workflowResumer.ResumeAsync(filter, options, cancellationToken);
if (responses.Count > 0) {
    await store.DeleteAsync(item.Id, cancellationToken);
}

// After: always deleted, exception-safe
try {
    var responses = await workflowResumer.ResumeAsync(filter, options, cancellationToken);
    // ... logging ...
}
catch (Exception ex) {
    logger.LogError(ex, "Error resuming workflow...");
}
finally {
    try {
        await store.DeleteAsync(item.Id, CancellationToken.None);
    }
    catch (Exception ex) {
        logger.LogWarning(ex, "Failed to delete queue item...");
    }
}

Safety

The distributed lock in WorkflowResumer.ResumeAsync() ensures bookmarks are only processed once. Queue items are ephemeral markers—safe to delete after processing attempt. Existing 1-minute TTL purge task serves as fallback for any deletion failures.

Original prompt

This section details on the original issue you should resolve

<issue_title>Bookmarks remain stuck in database when resuming multiple workflows in rapid succession</issue_title>
<issue_description>## Description
When attempting to resume multiple bookmarks in rapid succession, several records remain stuck in the Bookmarks table and are never processed, even though the API calls to /tasks/{taskid}/complete return a successful (200 OK) response.

Steps to Reproduce
Setup & Execution
Open the attached Visual Studio project.

Configure your connection string in the appsettings.Development.json file of the Server project.

Start the Server project.

Log into Elsa Studio using default credentials and import the attached Workflow JSON.

Publish the imported workflow.

Configure your connection string in the appsettings.json of the ConsoleTestApp project.

Build the ConsoleTestApp project.

Ensure the Bookmarks table is currently empty.

Open a command prompt (CMD) in the directory containing Sample.ElsaWorkflow.ConsoleTestApp.exe.
Execute the following command to initialize workflows:
Sample.ElsaWorkflow.ConsoleTestApp.exe run 200
Wait for completion and verify that 200 records exist in the Bookmarks table.
Execute the following command to resume the workflows:
Sample.ElsaWorkflow.ConsoleTestApp.exe resume 200
Wait for the resume process to finish.
Observation

Upon completion, several records remain "hanging" in the Bookmarks table.

No errors are reported in the server logs.
Expected Behavior

All bookmarks should be successfully resumed, and the corresponding workflows should proceed to completion.

Actual Behavior

A subset of bookmarks remains stuck in the database, leaving the workflows in a permanent Suspended state.

Environment

Elsa Package Version: 3.5.3 and 3.6.0-rc2

Operating System: Windows 11, Windows Server 2016, Windows Server 2019.

Database Engine: SQL Server 15 Express, SQL Server 14 Developer.

Dispatcher: Default Workflow Dispatcher.

test-input-parent.json
Sample.ElsaWorkflow.zip
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Bookmarks remain stuck in database when resuming multiple workflows in rapid succession #7223

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

This change ensures that bookmark queue items are always deleted after processing, regardless of whether matching bookmarks were found. This prevents duplicate queue items from accumulating in the database when the same bookmark is queued multiple times in rapid succession. The distributed lock in WorkflowResumer ensures that the actual bookmark is only processed once, so it's safe to always delete the queue item. Fixes race condition where: 1. Multiple rapid requests create duplicate queue items 2. First item processes and deletes the bookmark 3. Second item finds no bookmark but was NOT deleted 4. Queue items remain stuck in database indefinitely Co-authored-by: sfmskywalker <938393+sfmskywalker@users.noreply.github.com>

Enhanced the fix to ensure queue items are always deleted even if an exception occurs during workflow resumption. The deletion is now in a finally block with its own try-catch to handle any deletion failures gracefully. This addresses the code review feedback and ensures robustness in all scenarios: - Normal processing (bookmark found or not) - Exception during workflow resumption (timeout, network error, etc.) - Exception during deletion itself (database connectivity issues) Co-authored-by: sfmskywalker <938393+sfmskywalker@users.noreply.github.com>

Enhanced the fix based on code review feedback: 1. Added error handling at ProcessPageAsync level to ensure one failing item doesn't block processing of other items in the batch 2. Added explicit exception logging in ProcessItemAsync to surface errors during workflow resumption 3. Changed deletion to use CancellationToken.None to ensure cleanup happens even during application shutdown 4. Added detailed comments explaining error handling strategy These improvements ensure the system is robust and provides visibility into any issues while still guaranteeing queue items are cleaned up properly. Co-authored-by: sfmskywalker <938393+sfmskywalker@users.noreply.github.com>

Copilot

Pull request overview

Addresses a race condition where rapid successive resume requests can leave duplicate bookmark queue items stuck in the queue, leading to accumulation and unprocessed workflows.

Changes:

Wrap per-item processing in ProcessPageAsync with error handling to keep batch processing moving.
Update ProcessItemAsync to always delete the queue item via a finally block and add logging for resume failures.

Copilot · 2026-02-02T10:08:29Z

src/modules/Elsa.Workflows.Runtime/Services/BookmarkQueueProcessor.cs

+        foreach (var bookmarkQueueItem in page.Items)
+        {
+            try
+            {
+                await ProcessItemAsync(bookmarkQueueItem, cancellationToken);
+            }
+            catch (Exception ex)
+            {
+                // Log the error but continue processing other items in the batch
+                logger.LogError(ex, "Error processing bookmark queue item {BookmarkQueueItemId}. Continuing with next item.", bookmarkQueueItem.Id);
+            }


ProcessPageAsync catches all exceptions and continues, which will also catch OperationCanceledException if cancellation is requested (e.g., during shutdown) and ProcessItemAsync starts propagating it. This can lead to noisy error logs and the processor continuing work even though cancellation was requested. Handle OperationCanceledException separately (rethrow) and/or break out of the loop when cancellationToken.IsCancellationRequested is true.

Copilot · 2026-02-02T10:08:29Z

src/modules/Elsa.Workflows.Runtime/Services/BookmarkQueueProcessor.cs

+        catch (Exception ex)
        {
-            logger.LogDebug("Successfully resumed {WorkflowCount} workflow instances using stimulus {StimulusHash} for activity type {ActivityType}", responses.Count, item.StimulusHash, item.ActivityTypeName);
-            await store.DeleteAsync(item.Id, cancellationToken);
+            logger.LogError(ex, "Error resuming workflow for bookmark queue item {BookmarkQueueItemId} for workflow instance {WorkflowInstanceId}. The queue item will still be deleted to prevent accumulation.", item.Id, item.WorkflowInstanceId);
        }


The catch (Exception ex) here will also catch OperationCanceledException when cancellationToken is canceled, log it as an error, and then continue processing more items. This makes shutdown/cancellation noisy and prevents normal cancellation flow. Add an OperationCanceledException catch/when-filter that rethrows when cancellationToken.IsCancellationRequested, and only log unexpected exceptions.

Copilot · 2026-02-02T10:08:29Z

src/modules/Elsa.Workflows.Runtime/Services/BookmarkQueueProcessor.cs

+        finally
        {
-            logger.LogDebug("No matching bookmarks found for bookmark queue item {BookmarkQueueItemId} for workflow instance {WorkflowInstanceId} for activity type {ActivityType} with stimulus {StimulusHash}", item.Id, item.WorkflowInstanceId, item.ActivityTypeName, item.StimulusHash);
+            // Always delete the queue item after processing, regardless of whether bookmarks were found or an exception occurred.
+            // This prevents duplicate queue items from accumulating when the same bookmark is queued multiple times in rapid succession.
+            // The distributed lock in WorkflowResumer ensures that the actual bookmark is only processed once.
+            // Use CancellationToken.None to ensure cleanup happens even during application shutdown.
+            try
+            {
+                await store.DeleteAsync(item.Id, CancellationToken.None);
+            }


Because queue items are now always deleted, the offset-based pagination in ProcessAsync can skip items: after deleting the first page, incrementing offset causes the next query to skip records in the remaining set, potentially leaving items unprocessed until the next trigger (or even until they are purged by TTL). Consider switching the loop to always read from offset 0 until no items remain, or use cursor-based paging (e.g., last CreatedAt/Id) instead of offsets when deleting during iteration.

Copilot · 2026-02-02T10:08:29Z

src/modules/Elsa.Workflows.Runtime/Services/BookmarkQueueProcessor.cs

+            // Use CancellationToken.None to ensure cleanup happens even during application shutdown.
+            try
+            {
+                await store.DeleteAsync(item.Id, CancellationToken.None);
+            }


Using CancellationToken.None for the delete means this await can block indefinitely if the underlying store call hangs (e.g., DB connectivity issues), which is especially risky during shutdown. Prefer using a bounded token (e.g., a short CancelAfter timeout) so cleanup is attempted but can't stall the processor forever.

Initial plan

2b6a691

Copilot AI assigned Copilot and sfmskywalker Jan 30, 2026

Copilot started work on behalf of sfmskywalker January 30, 2026 16:25 View session

Copilot AI and others added 3 commits January 30, 2026 16:36

Copilot AI changed the title ~~[WIP] Fix bookmarks remaining stuck in database during workflow resumption~~ Fix race condition causing bookmark queue items to accumulate during rapid succession resumes Jan 30, 2026

Copilot AI requested a review from sfmskywalker January 30, 2026 16:48

Copilot finished work on behalf of sfmskywalker January 30, 2026 16:48

sfmskywalker marked this pull request as ready for review February 2, 2026 10:03

Copilot AI review requested due to automatic review settings February 2, 2026 10:03

Copilot started reviewing on behalf of sfmskywalker February 2, 2026 10:04 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition causing bookmark queue items to accumulate during rapid succession resumes#7225

Fix race condition causing bookmark queue items to accumulate during rapid succession resumes#7225
Copilot wants to merge 4 commits intorelease/3.6.0from
copilot/fix-bookmarks-processing-issue

Copilot AI commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Before/After

Safety

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 30, 2026 •

edited

Loading