Feature/loop for crashed tasks by rakovskij-stanislav · Pull Request #266 · CERT-Polska/karton

rakovskij-stanislav · 2025-01-07T10:19:05Z

Karton already supports debug mode, but sometimes you need to works with already "unwealthy" tasks that crashed.

Restarting them is not an option - you cannot predict which consumer will get this task, while you need to feed this task to exact consumer you run under debugger / with extra prints / etc. The goal is to debug crashed task without interrupting the processing.

This PR introduces new method, Consumer.loop_crached_tasks that gradually makes debugging process more easy by working only with crashed tasks. It has several restrictions:

It ignores timeouts (for benefit of debugging).
It works with karton.tasks instead of karton.queue list in redis.
It does not register new binds, shut down already running instances.
It does not increment CRASHED metrics on further crashes.
It does not immediately change this task status to STARTED.

rakovskij-stanislav · 2025-02-21T07:27:01Z

@psrok1 Hello! Just a gentle ping regarding this PR)

…kState.CRASHED to karton.py

psrok1 · 2025-08-27T11:58:27Z

Hello! Thanks for the contribution and sorry for very late answer. We had some internal discussions about this PR and then it was put on the shelf.

I don't think that this method should be a part of Consumer class if it has so many constraints:

        - It does not rely on `karton.queue`. It finds crashed doc in `karton.task`.
          So RUN ONLY ONE REPLICA to avoid race condition
          and large resource consumption.
        - It does not rely in task_timeout.
        - It does not register new binds.
        - It does not shut down another instances on binds / version mismatch.
        - It does not listen queue in a traditional way / it dows not subsribe.
          It looks for tasks in `CRASHED` state.
        - It does not increment `TASK_CRASHED` metrics.
        - It reimplements `Cunsumer.internal_process` in simplified way.

If we add something to one of the main Karton classes, we also set a contract that we will not break it in future versions. This feature looks like something really difficult to maintain and it's possible that we're going to add much more points to that list in the future 😃

We thought about implementing it in a different way, but right now we don't have any ideas. I would make an issue on that problem and maybe someone will come with less hacky way to implement such utility.

rakovskij-stanislav and others added 4 commits January 7, 2025 13:03

Experimental feature: loop for crashed tasks

033802b

Do not make task STARTED for not losing the task in this status

6073d1a

linting

944a4cb

Merge branch 'master' into feature/loop_for_crashed_tasks

bcc2947

Stanislav Rakovsky and others added 4 commits February 21, 2025 10:33

Fix cases when there is no 'receiver' field in headers yet

4d1dbca

Merge branch 'CERT-Polska:master' into feature/loop_for_crashed_tasks

cc659e9

Add check that task.status is TaskState.CRASHED

ed00175

Delete wrong-placed code in system.py & add required condition of Tas…

d0912f8

…kState.CRASHED to karton.py

psrok1 closed this Aug 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feature/loop for crashed tasks#266

Feature/loop for crashed tasks#266
rakovskij-stanislav wants to merge 8 commits intoCERT-Polska:masterfrom
rakovskij-stanislav:feature/loop_for_crashed_tasks

rakovskij-stanislav commented Jan 7, 2025 •

edited

Loading

Uh oh!

rakovskij-stanislav commented Feb 21, 2025

Uh oh!

psrok1 commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

rakovskij-stanislav commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rakovskij-stanislav commented Feb 21, 2025

Uh oh!

psrok1 commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rakovskij-stanislav commented Jan 7, 2025 •

edited

Loading