Skip to content

Refactor SiemensHealthineers scraper to use updated job listing URL and improve job data extraction#660

Merged
lalalaurentiu merged 1 commit intopeviitor-ro:mainfrom
lalalaurentiu:main
Nov 4, 2025
Merged

Refactor SiemensHealthineers scraper to use updated job listing URL and improve job data extraction#660
lalalaurentiu merged 1 commit intopeviitor-ro:mainfrom
lalalaurentiu:main

Conversation

@lalalaurentiu
Copy link
Collaborator

This pull request refactors the Siemens Healthineers job scraper in sites/siemenshealthineers.py to adapt to changes in the target website's structure and improve data extraction. The main changes involve switching from a JSON API to HTML parsing, updating pagination logic, and modifying how job details are collected.

Adaptation to new website structure:

  • Changed the url and data fetching logic from a JSON API endpoint to an HTML page, and updated the scraper to parse HTML elements instead of JSON objects.
  • Updated job extraction to find job elements using find_all("article", class_="article") and extract job details (title, link, city) from HTML tags.

Pagination and job count calculation:

  • Modified how the total number of jobs is calculated, now parsing the count from an HTML element (div.list-controls__text__legend) instead of a JSON field.
  • Adjusted pagination logic to match the new page size (6 jobs per page) and updated the URL for fetching subsequent pages with the correct offset parameter.

Job location and county extraction:

  • Changed city extraction to parse from a single HTML span per job, and updated county lookup to match the new city extraction method. (F12813f1L5

@lalalaurentiu lalalaurentiu merged commit da6ed9b into peviitor-ro:main Nov 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant