BaseExtractor crashes entire pipeline on transient LLM errors

When an LLM call fails during metadata extraction (e.g. Azure content safety false positive, rate limit, transient network error), the entire ingestion pipeline crashes. This happens because `BaseExtractor.aprocess_nodes()` calls `aextract()` with no error handling at all -- a single failed node kills the whole batch.

This is the scenario described in #20054. The reporter hits this about every 15,000 nodes with Azure OpenAI guardrails.

**Root cause**

1. `aprocess_nodes()` calls `await self.aextract(new_nodes)` on line 129 of `interface.py` with no try/catch
2. `run_jobs()` in `async_utils.py` uses `asyncio.gather()` without `return_exceptions=True`, so one failed job kills the batch
3. None of the standard extractors (Title, Keyword, QA, Summary) have per-node error handling
4. Only `DocumentContextExtractor` has any resilience, but it's a hardcoded 5-retry with 60s backoff that only catches rate limit errors

**Proposed fix**

Add three configurable fields to `BaseExtractor` that all extractors inherit automatically:

- `max_retries` (default 0 -- current behaviour, no retry)
- `retry_backoff` (default 1.0s, exponential backoff)
- `on_extraction_error` ("raise" or "skip" -- "raise" is current behaviour)

The retry logic lives in a single `_aextract_with_retry()` method called from `aprocess_nodes()`. Fully backwards compatible since all defaults match existing behaviour.

Example usage for someone hitting the Azure guardrail issue:

```python
from llama_index.core.extractors import TitleExtractor

extractor = TitleExtractor(
    llm=llm,
    max_retries=3,
    retry_backoff=2.0,
    on_extraction_error="skip",
)
```

This would retry up to 3 times with exponential backoff (2s, 4s, 8s), and if all retries fail, log a warning and continue with empty metadata instead of crashing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaseExtractor crashes entire pipeline on transient LLM errors #20692

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BaseExtractor crashes entire pipeline on transient LLM errors #20692

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions