JobStreet Jobs Scraper

JobStreet Jobs Scraper collects structured job listing data so you can analyze hiring demand, track roles by region, and build clean datasets fast. It focuses on essential job posting fields and keeps output predictable for dashboards, research, and recruitment workflows. Use this JobStreet scraper to turn search results into data you can query and automate.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for jobstreet-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts job listings from JobStreet search results and returns a consistent JSON dataset containing key job posting details. It solves the problem of manually copying listings across pages by providing a repeatable, filterable workflow driven by simple inputs. It’s built for recruiters, analysts, founders, and developers who want reliable job market data for reporting, alerts, and trend analysis.

Built for search-driven job research

Supports keyword-first job discovery with optional location targeting
Adds posted-date filters to keep datasets fresh (24h / 7d / 30d / anytime)
Handles pagination to collect results across multiple listing pages
Exports clean, minimal columns for fast post-processing
Includes proxy + cookie options for stability in different regions

Features

Feature	Description
Keyword-based search	Extracts job listings using a required keyword input for targeted role discovery.
Location filtering	Optional location filter to focus on specific cities/countries/regions.
Posted-date filtering	Limits results to recent jobs (24h, 7d, 30d, anytime) for fresher datasets.
Pagination handling	Automatically walks through result pages to collect more listings.
Max jobs / max pages limits	Safety controls to cap scraping scope and runtime.
Cookie support	Accepts raw cookies or JSON cookie objects to handle consent/restricted views.
Proxy configuration	Works with proxy settings to improve reliability and reduce blocking.
Dual description formats	Captures both HTML and clean plain-text job descriptions.
JSON output dataset	Produces structured JSON items ready for pipelines, DB inserts, or spreadsheets.

What Data This Scraper Extracts

Field Name	Field Description
title	Job title shown on the listing.
company	Hiring company name associated with the job.
location	Job location (city/region/country as displayed).
date_posted	Posting date or relative age of the listing.
description_html	Job description in HTML format for rich formatting preservation.
description_text	Plain-text job description for indexing, NLP, and analytics.
url	Direct link to the job posting page.

Example Output

[
      {
            "title": "Data Analyst",
            "company": "Example Tech Sdn Bhd",
            "location": "Kuala Lumpur",
            "date_posted": "7d",
            "description_html": "<p>We are looking for a Data Analyst to build dashboards and reports...</p>",
            "description_text": "We are looking for a Data Analyst to build dashboards and reports...",
            "url": "https://www.jobstreet.com/job/12345678"
      },
      {
            "title": "Software Developer",
            "company": "Sample Solutions Pte Ltd",
            "location": "Singapore",
            "date_posted": "24h",
            "description_html": "<p>Join our engineering team to ship scalable services...</p>",
            "description_text": "Join our engineering team to ship scalable services...",
            "url": "https://www.jobstreet.com/job/87654321"
      }
]

Directory Structure Tree

JobStreet Scraper/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── clients/
│   │   ├── http_client.py
│   │   └── browser_client.py
│   ├── extractors/
│   │   ├── listing_parser.py
│   │   ├── detail_parser.py
│   │   └── text_cleaner.py
│   ├── pipelines/
│   │   ├── paginate.py
│   │   ├── dedupe.py
│   │   └── validate.py
│   ├── outputs/
│   │   ├── dataset_writer.py
│   │   └── exporters.py
│   └── config/
│       ├── input_schema.json
│       └── settings.example.json
├── data/
│   ├── input.example.json
│   └── sample_output.json
├── tests/
│   ├── test_parsers.py
│   └── test_validation.py
├── .env.example
├── .gitignore
├── requirements.txt
├── pyproject.toml
└── README.md

Use Cases

Recruitment teams use it to monitor new roles by keyword and location, so they can respond faster to hiring trends and competitor postings.
Job market analysts use it to collect weekly datasets, so they can measure demand shifts across regions and job titles.
HR tech builders use it to feed job listings into search and recommendation systems, so they can improve matching quality and coverage.
Data teams use it to create a structured job dataset for NLP, so they can extract skills, seniority, and role taxonomy at scale.
Founders and operators use it to track hiring signals in specific industries, so they can validate market direction and expansion opportunities.

FAQs

Q1: What inputs are required to run the scraper? You must provide keyword. Everything else is optional. Add location to narrow the search, and use posted_date to restrict results to recent jobs (24h/7d/30d/anytime). Use maxJobs and maxPages to control runtime and scope.

Q2: Why do I see fewer results than expected? This usually happens when filters are too strict (tight location, short posted_date, or low maxPages). Try widening your location, setting posted_date to anytime, or increasing maxPages. If access restrictions appear, provide cookies and enable proxy configuration.

Q3: What’s the difference between description_html and description_text? description_html preserves formatting like bullet lists and links for storage or rendering. description_text is cleaned and easier to index in search engines, store in databases, or use for NLP tasks like skill extraction.

Q4: How do I keep the scraper stable for larger runs? Use a proxy configuration, keep maxPages reasonable, and favor multiple smaller runs (e.g., per location or per keyword cluster). If you face consent prompts or restricted content, pass cookies or cookiesJson to maintain consistent access.

Performance Benchmarks and Results

Primary Metric: Averages 20–45 job listings per minute on typical searches, depending on page size, filters, and network conditions.

Reliability Metric: 95–99% successful page retrieval rate when using proxies and conservative page limits; lower stability is commonly tied to aggressive paging without proxies.

Efficiency Metric: Collects ~500 listings using under 300–500 MB RAM in headless mode with capped pagination and deduplication enabled.

Quality Metric: Typically achieves 90–98% field completeness on core fields (title/company/location/url), with description coverage varying by listing structure and access restrictions.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JobStreet Jobs Scraper

Introduction

Built for search-driven job research

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

nightking-oliver-powers/jobstreet-scraper

Folders and files

Latest commit

History

Repository files navigation

JobStreet Jobs Scraper

Introduction

Built for search-driven job research

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages