Currently, we use Trafilatura for web content extraction. While it works well for static pages, it does not support JavaScript-rendered pages, which limits its effectiveness.
We should investigate integrating alternative web read providers, such as Jina AI API or others, that can handle dynamic content.
The main factors to consider for any new provider are: rate limiting, stability, performance, and the quality of content extracted.