Continuously extract the information you need from specified sources
-
Updated
Feb 6, 2026 - Python
Continuously extract the information you need from specified sources
A Python project that extracts data from websites with the option to process the data through @openai's ChatGPT API. The results are either printed to stdout or sent via a POST request.
💎 Extract semantically structured information from any raw HTML or URL, supportted formats "Microdata, RDFa,JSON-LD, opengraph and meta tags.
Add a description, image, and links to the web-extractor topic page so that developers can more easily learn about it.
To associate your repository with the web-extractor topic, visit your repo's landing page and select "manage topics."