JobStreet Jobs Scraper collects structured job listing data so you can analyze hiring demand, track roles by region, and build clean datasets fast. It focuses on essential job posting fields and keeps output predictable for dashboards, research, and recruitment workflows. Use this JobStreet scraper to turn search results into data you can query and automate.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for jobstreet-scraper you've just found your team — Let’s Chat. 👆👆
This project extracts job listings from JobStreet search results and returns a consistent JSON dataset containing key job posting details. It solves the problem of manually copying listings across pages by providing a repeatable, filterable workflow driven by simple inputs. It’s built for recruiters, analysts, founders, and developers who want reliable job market data for reporting, alerts, and trend analysis.
- Supports keyword-first job discovery with optional location targeting
- Adds posted-date filters to keep datasets fresh (24h / 7d / 30d / anytime)
- Handles pagination to collect results across multiple listing pages
- Exports clean, minimal columns for fast post-processing
- Includes proxy + cookie options for stability in different regions
| Feature | Description |
|---|---|
| Keyword-based search | Extracts job listings using a required keyword input for targeted role discovery. |
| Location filtering | Optional location filter to focus on specific cities/countries/regions. |
| Posted-date filtering | Limits results to recent jobs (24h, 7d, 30d, anytime) for fresher datasets. |
| Pagination handling | Automatically walks through result pages to collect more listings. |
| Max jobs / max pages limits | Safety controls to cap scraping scope and runtime. |
| Cookie support | Accepts raw cookies or JSON cookie objects to handle consent/restricted views. |
| Proxy configuration | Works with proxy settings to improve reliability and reduce blocking. |
| Dual description formats | Captures both HTML and clean plain-text job descriptions. |
| JSON output dataset | Produces structured JSON items ready for pipelines, DB inserts, or spreadsheets. |
| Field Name | Field Description |
|---|---|
| title | Job title shown on the listing. |
| company | Hiring company name associated with the job. |
| location | Job location (city/region/country as displayed). |
| date_posted | Posting date or relative age of the listing. |
| description_html | Job description in HTML format for rich formatting preservation. |
| description_text | Plain-text job description for indexing, NLP, and analytics. |
| url | Direct link to the job posting page. |
[
{
"title": "Data Analyst",
"company": "Example Tech Sdn Bhd",
"location": "Kuala Lumpur",
"date_posted": "7d",
"description_html": "<p>We are looking for a Data Analyst to build dashboards and reports...</p>",
"description_text": "We are looking for a Data Analyst to build dashboards and reports...",
"url": "https://www.jobstreet.com/job/12345678"
},
{
"title": "Software Developer",
"company": "Sample Solutions Pte Ltd",
"location": "Singapore",
"date_posted": "24h",
"description_html": "<p>Join our engineering team to ship scalable services...</p>",
"description_text": "Join our engineering team to ship scalable services...",
"url": "https://www.jobstreet.com/job/87654321"
}
]
JobStreet Scraper/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── clients/
│ │ ├── http_client.py
│ │ └── browser_client.py
│ ├── extractors/
│ │ ├── listing_parser.py
│ │ ├── detail_parser.py
│ │ └── text_cleaner.py
│ ├── pipelines/
│ │ ├── paginate.py
│ │ ├── dedupe.py
│ │ └── validate.py
│ ├── outputs/
│ │ ├── dataset_writer.py
│ │ └── exporters.py
│ └── config/
│ ├── input_schema.json
│ └── settings.example.json
├── data/
│ ├── input.example.json
│ └── sample_output.json
├── tests/
│ ├── test_parsers.py
│ └── test_validation.py
├── .env.example
├── .gitignore
├── requirements.txt
├── pyproject.toml
└── README.md
- Recruitment teams use it to monitor new roles by keyword and location, so they can respond faster to hiring trends and competitor postings.
- Job market analysts use it to collect weekly datasets, so they can measure demand shifts across regions and job titles.
- HR tech builders use it to feed job listings into search and recommendation systems, so they can improve matching quality and coverage.
- Data teams use it to create a structured job dataset for NLP, so they can extract skills, seniority, and role taxonomy at scale.
- Founders and operators use it to track hiring signals in specific industries, so they can validate market direction and expansion opportunities.
Q1: What inputs are required to run the scraper?
You must provide keyword. Everything else is optional. Add location to narrow the search, and use posted_date to restrict results to recent jobs (24h/7d/30d/anytime). Use maxJobs and maxPages to control runtime and scope.
Q2: Why do I see fewer results than expected?
This usually happens when filters are too strict (tight location, short posted_date, or low maxPages). Try widening your location, setting posted_date to anytime, or increasing maxPages. If access restrictions appear, provide cookies and enable proxy configuration.
Q3: What’s the difference between description_html and description_text?
description_html preserves formatting like bullet lists and links for storage or rendering. description_text is cleaned and easier to index in search engines, store in databases, or use for NLP tasks like skill extraction.
Q4: How do I keep the scraper stable for larger runs?
Use a proxy configuration, keep maxPages reasonable, and favor multiple smaller runs (e.g., per location or per keyword cluster). If you face consent prompts or restricted content, pass cookies or cookiesJson to maintain consistent access.
Primary Metric: Averages 20–45 job listings per minute on typical searches, depending on page size, filters, and network conditions.
Reliability Metric: 95–99% successful page retrieval rate when using proxies and conservative page limits; lower stability is commonly tied to aggressive paging without proxies.
Efficiency Metric: Collects ~500 listings using under 300–500 MB RAM in headless mode with capped pagination and deduplication enabled.
Quality Metric: Typically achieves 90–98% field completeness on core fields (title/company/location/url), with description coverage varying by listing structure and access restrictions.
