Skip to content

[Improvement]Web Read Integration: Performance, Stability, and Alternatives #1147

@earayu

Description

@earayu

Currently, we use Trafilatura for web content extraction. While it works well for static pages, it does not support JavaScript-rendered pages, which limits its effectiveness.

We should investigate integrating alternative web read providers, such as Jina AI API or others, that can handle dynamic content.

The main factors to consider for any new provider are: rate limiting, stability, performance, and the quality of content extracted.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions