Refactor Inetum scraper to streamline pagination and improve job data extraction by lalalaurentiu · Pull Request #671 · peviitor-ro/based_scraper_py

lalalaurentiu · 2025-12-12T17:17:56Z

This pull request refactors the job scraping logic for the Inetum site to improve pagination handling and ensure proper HTTP headers are set for requests. The changes replace a static pagination approach with a dynamic loop and set a custom user-agent header to improve scraping reliability.

Scraper improvements:

Added a custom User-Agent header to all HTTP requests to mimic a real browser and reduce the risk of being blocked by the target site (sites/inetum.py).
Refactored pagination logic to use a while loop that fetches jobs page by page until no more jobs are found, instead of relying on a precomputed number of pages (sites/inetum.py). [1] [2]

Code cleanup:

Removed unused imports and variables, such as json, ceil, and totalJobs, to simplify the code (sites/inetum.py).

… extraction

Refactor Inetum scraper to streamline pagination and improve job data…

7415807

… extraction

lalalaurentiu merged commit 2dc5a53 into peviitor-ro:main Dec 12, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Inetum scraper to streamline pagination and improve job data extraction#671

Refactor Inetum scraper to streamline pagination and improve job data extraction#671
lalalaurentiu merged 1 commit intopeviitor-ro:mainfrom
lalalaurentiu:main

lalalaurentiu commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lalalaurentiu commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant