Skip to content

Refactor Inetum scraper to streamline pagination and improve job data extraction#671

Merged
lalalaurentiu merged 1 commit intopeviitor-ro:mainfrom
lalalaurentiu:main
Dec 12, 2025
Merged

Refactor Inetum scraper to streamline pagination and improve job data extraction#671
lalalaurentiu merged 1 commit intopeviitor-ro:mainfrom
lalalaurentiu:main

Conversation

@lalalaurentiu
Copy link
Collaborator

This pull request refactors the job scraping logic for the Inetum site to improve pagination handling and ensure proper HTTP headers are set for requests. The changes replace a static pagination approach with a dynamic loop and set a custom user-agent header to improve scraping reliability.

Scraper improvements:

  • Added a custom User-Agent header to all HTTP requests to mimic a real browser and reduce the risk of being blocked by the target site (sites/inetum.py).
  • Refactored pagination logic to use a while loop that fetches jobs page by page until no more jobs are found, instead of relying on a precomputed number of pages (sites/inetum.py). [1] [2]

Code cleanup:

  • Removed unused imports and variables, such as json, ceil, and totalJobs, to simplify the code (sites/inetum.py).

@lalalaurentiu lalalaurentiu merged commit 2dc5a53 into peviitor-ro:main Dec 12, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant