Skip to content

nightking-oliver-powers/jobstreet-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

JobStreet Jobs Scraper

JobStreet Jobs Scraper collects structured job listing data so you can analyze hiring demand, track roles by region, and build clean datasets fast. It focuses on essential job posting fields and keeps output predictable for dashboards, research, and recruitment workflows. Use this JobStreet scraper to turn search results into data you can query and automate.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for jobstreet-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts job listings from JobStreet search results and returns a consistent JSON dataset containing key job posting details. It solves the problem of manually copying listings across pages by providing a repeatable, filterable workflow driven by simple inputs. It’s built for recruiters, analysts, founders, and developers who want reliable job market data for reporting, alerts, and trend analysis.

Built for search-driven job research

  • Supports keyword-first job discovery with optional location targeting
  • Adds posted-date filters to keep datasets fresh (24h / 7d / 30d / anytime)
  • Handles pagination to collect results across multiple listing pages
  • Exports clean, minimal columns for fast post-processing
  • Includes proxy + cookie options for stability in different regions

Features

Feature Description
Keyword-based search Extracts job listings using a required keyword input for targeted role discovery.
Location filtering Optional location filter to focus on specific cities/countries/regions.
Posted-date filtering Limits results to recent jobs (24h, 7d, 30d, anytime) for fresher datasets.
Pagination handling Automatically walks through result pages to collect more listings.
Max jobs / max pages limits Safety controls to cap scraping scope and runtime.
Cookie support Accepts raw cookies or JSON cookie objects to handle consent/restricted views.
Proxy configuration Works with proxy settings to improve reliability and reduce blocking.
Dual description formats Captures both HTML and clean plain-text job descriptions.
JSON output dataset Produces structured JSON items ready for pipelines, DB inserts, or spreadsheets.

What Data This Scraper Extracts

Field Name Field Description
title Job title shown on the listing.
company Hiring company name associated with the job.
location Job location (city/region/country as displayed).
date_posted Posting date or relative age of the listing.
description_html Job description in HTML format for rich formatting preservation.
description_text Plain-text job description for indexing, NLP, and analytics.
url Direct link to the job posting page.

Example Output

[
      {
            "title": "Data Analyst",
            "company": "Example Tech Sdn Bhd",
            "location": "Kuala Lumpur",
            "date_posted": "7d",
            "description_html": "<p>We are looking for a Data Analyst to build dashboards and reports...</p>",
            "description_text": "We are looking for a Data Analyst to build dashboards and reports...",
            "url": "https://www.jobstreet.com/job/12345678"
      },
      {
            "title": "Software Developer",
            "company": "Sample Solutions Pte Ltd",
            "location": "Singapore",
            "date_posted": "24h",
            "description_html": "<p>Join our engineering team to ship scalable services...</p>",
            "description_text": "Join our engineering team to ship scalable services...",
            "url": "https://www.jobstreet.com/job/87654321"
      }
]

Directory Structure Tree

JobStreet Scraper/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── clients/
│   │   ├── http_client.py
│   │   └── browser_client.py
│   ├── extractors/
│   │   ├── listing_parser.py
│   │   ├── detail_parser.py
│   │   └── text_cleaner.py
│   ├── pipelines/
│   │   ├── paginate.py
│   │   ├── dedupe.py
│   │   └── validate.py
│   ├── outputs/
│   │   ├── dataset_writer.py
│   │   └── exporters.py
│   └── config/
│       ├── input_schema.json
│       └── settings.example.json
├── data/
│   ├── input.example.json
│   └── sample_output.json
├── tests/
│   ├── test_parsers.py
│   └── test_validation.py
├── .env.example
├── .gitignore
├── requirements.txt
├── pyproject.toml
└── README.md

Use Cases

  • Recruitment teams use it to monitor new roles by keyword and location, so they can respond faster to hiring trends and competitor postings.
  • Job market analysts use it to collect weekly datasets, so they can measure demand shifts across regions and job titles.
  • HR tech builders use it to feed job listings into search and recommendation systems, so they can improve matching quality and coverage.
  • Data teams use it to create a structured job dataset for NLP, so they can extract skills, seniority, and role taxonomy at scale.
  • Founders and operators use it to track hiring signals in specific industries, so they can validate market direction and expansion opportunities.

FAQs

Q1: What inputs are required to run the scraper? You must provide keyword. Everything else is optional. Add location to narrow the search, and use posted_date to restrict results to recent jobs (24h/7d/30d/anytime). Use maxJobs and maxPages to control runtime and scope.

Q2: Why do I see fewer results than expected? This usually happens when filters are too strict (tight location, short posted_date, or low maxPages). Try widening your location, setting posted_date to anytime, or increasing maxPages. If access restrictions appear, provide cookies and enable proxy configuration.

Q3: What’s the difference between description_html and description_text? description_html preserves formatting like bullet lists and links for storage or rendering. description_text is cleaned and easier to index in search engines, store in databases, or use for NLP tasks like skill extraction.

Q4: How do I keep the scraper stable for larger runs? Use a proxy configuration, keep maxPages reasonable, and favor multiple smaller runs (e.g., per location or per keyword cluster). If you face consent prompts or restricted content, pass cookies or cookiesJson to maintain consistent access.


Performance Benchmarks and Results

Primary Metric: Averages 20–45 job listings per minute on typical searches, depending on page size, filters, and network conditions.

Reliability Metric: 95–99% successful page retrieval rate when using proxies and conservative page limits; lower stability is commonly tied to aggressive paging without proxies.

Efficiency Metric: Collects ~500 listings using under 300–500 MB RAM in headless mode with capped pagination and deduplication enabled.

Quality Metric: Typically achieves 90–98% field completeness on core fields (title/company/location/url), with description coverage varying by listing structure and access restrictions.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

No packages published