A robust, automated tool for scraping and aggregating job listings from multiple online platforms. Built with Python and Playwright, this engine handles dynamic web content to extract key job details and export them into a structured format.
- Dynamic Content Handling: Uses Playwright to scrape modern websites with heavy JavaScript loading.
- Structured Data Output: Automatically generates a
jobs.jsonfile containing title, company, location, and direct links. - Headless Operation: Designed to run efficiently in the background without a GUI.
- Extensible Scraper Logic: Easily adaptable to different job board structures.
- Language: Python 3.11+
- Automation: Playwright (Chromium/WebKit/Firefox)
- Testing: Pytest
- Data Format: JSON
- Python 3.11 or higher installed.
pipfor package management.
-
Clone the repository:
git clone [https://github.com/Ahmed-Yusuf-1/job_board_engine.git](https://github.com/Ahmed-Yusuf-1/job_board_engine.git) cd job_board_engine -
Create and activate a virtual environment:
python -m venv venv # Linux/MacOS source venv/bin/activate # Windows .\venv\Scripts\activate
-
Install dependencies:
pip install playwright pytest playwright install
To run the scraper and update the job database:
python test_scraper.py