Skip to content

Comments

Feature/loop for crashed tasks#266

Closed
rakovskij-stanislav wants to merge 8 commits intoCERT-Polska:masterfrom
rakovskij-stanislav:feature/loop_for_crashed_tasks
Closed

Feature/loop for crashed tasks#266
rakovskij-stanislav wants to merge 8 commits intoCERT-Polska:masterfrom
rakovskij-stanislav:feature/loop_for_crashed_tasks

Conversation

@rakovskij-stanislav
Copy link
Contributor

@rakovskij-stanislav rakovskij-stanislav commented Jan 7, 2025

Karton already supports debug mode, but sometimes you need to works with already "unwealthy" tasks that crashed.

Restarting them is not an option - you cannot predict which consumer will get this task, while you need to feed this task to exact consumer you run under debugger / with extra prints / etc. The goal is to debug crashed task without interrupting the processing.

This PR introduces new method, Consumer.loop_crached_tasks that gradually makes debugging process more easy by working only with crashed tasks. It has several restrictions:

  • It ignores timeouts (for benefit of debugging).
  • It works with karton.tasks instead of karton.queue list in redis.
  • It does not register new binds, shut down already running instances.
  • It does not increment CRASHED metrics on further crashes.
  • It does not immediately change this task status to STARTED.

@rakovskij-stanislav
Copy link
Contributor Author

@psrok1 Hello! Just a gentle ping regarding this PR)

@psrok1
Copy link
Member

psrok1 commented Aug 27, 2025

Hello! Thanks for the contribution and sorry for very late answer. We had some internal discussions about this PR and then it was put on the shelf.

I don't think that this method should be a part of Consumer class if it has so many constraints:

        - It does not rely on `karton.queue`. It finds crashed doc in `karton.task`.
          So RUN ONLY ONE REPLICA to avoid race condition
          and large resource consumption.
        - It does not rely in task_timeout.
        - It does not register new binds.
        - It does not shut down another instances on binds / version mismatch.
        - It does not listen queue in a traditional way / it dows not subsribe.
          It looks for tasks in `CRASHED` state.
        - It does not increment `TASK_CRASHED` metrics.
        - It reimplements `Cunsumer.internal_process` in simplified way.

If we add something to one of the main Karton classes, we also set a contract that we will not break it in future versions. This feature looks like something really difficult to maintain and it's possible that we're going to add much more points to that list in the future 😃

We thought about implementing it in a different way, but right now we don't have any ideas. I would make an issue on that problem and maybe someone will come with less hacky way to implement such utility.

@psrok1 psrok1 closed this Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants