Skip to content

[BUG]Celery worker crashes when uploading multiple documents concurrently (asyncio Queue bound to a different event loop) #1397

@doubleclip118

Description

@doubleclip118

Describe the bug

When uploading several documents at once (e.g., 3+), the Celery worker intermittently crashes or logs repeated asyncio errors. The error originates from LiteLLM’s async logging worker: an asyncio.Queue created on one event loop is later awaited from a different loop. This floods the logs with:
RuntimeError: <Queue at 0x... maxsize=50000> is bound to a different event loop
and can coincide with worker instability under load.

To Reproduce

Steps to reproduce the behavior:

Start the Aperag stack (API, celeryworker, redis, postgres, qdrant, es) with the default docker-compose.

Ensure Celery worker is running with a multi-concurrency setup (e.g., --pool=threads --concurrency=16) and LiteLLM logging enabled (default).

Upload 3 or more PDFs simultaneously so that VECTOR / FULLTEXT / GRAPH / SUMMARY indexing tasks are triggered in parallel.

See error in the celeryworker logs and observe degraded stability or worker restarts.

Screenshots & Logs

If applicable, add screenshots to help explain your problem.

[2025-10-28 06:00:03,463: INFO/MainProcess] aperag.tasks.document - Parsing document doc2068cf764beaae64 [2025-10-28 06:00:03,650: ERROR/MainProcess] asyncio - Task exception was never retrieved future: <Task finished name='Task-16375' coro=<LoggingWorker._worker_loop() done, defined at /opt/venv/lib/python3.11/site-packages/litellm/litellm_core_utils/logging_worker.py:43> exception=RuntimeError('<Queue at 0x72ef9d72e990 maxsize=50000> is bound to a different event loop')> Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/litellm/litellm_core_utils/logging_worker.py", line 51, in _worker_loop coroutine = await self._queue.get() File "/usr/local/lib/python3.11/asyncio/queues.py", line 155, in get getter = self._get_loop().create_future() File "/usr/local/lib/python3.11/asyncio/mixins.py", line 20, in _get_loop raise RuntimeError(f'{self!r} is bound to a different event loop') RuntimeError: <Queue at 0x72ef9d72e990 maxsize=50000> is bound to a different event loop

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions