Chemistwarehouse Reviews Scraper collects structured customer review data from Chemist Warehouse product pages to support analysis, insights, and decision-making. It helps teams turn raw customer feedback into actionable intelligence using clean, consistent review data.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for chemistwarehouse-reviews-spider you've just found your team β Letβs Chat. ππ
This project extracts detailed customer reviews from Chemist Warehouse product pages and converts them into structured datasets. It solves the problem of manually collecting and normalizing large volumes of customer feedback. It is built for analysts, researchers, and teams working with product reviews and consumer sentiment.
- Processes multiple product URLs in a single run for efficient batch analysis
- Normalizes ratings, titles, dates, and product metadata into consistent fields
- Supports sentiment analysis and downstream analytics workflows
- Designed for reliability on dynamic, content-heavy product pages
| Feature | Description |
|---|---|
| Review data capture | Extracts review IDs, ratings, titles, and full review text where available. |
| Product metadata mapping | Associates each review with product name, category, brand, and URL. |
| Batch URL processing | Handles multiple product pages in one execution for scalability. |
| Structured JSON output | Produces clean, analysis-ready data suitable for pipelines and dashboards. |
| Error resilience | Maintains stable output even when optional fields are missing. |
| Field Name | Field Description |
|---|---|
| Product_Id | Unique identifier of the reviewed product. |
| Review_Id | Unique identifier of the customer review. |
| Rating | Numeric rating score given by the customer. |
| Title | Short headline or title of the review. |
| Body | Full review text when available. |
| Full_Review | Combined title and body text for analysis. |
| Product_Name | Name of the product being reviewed. |
| Product_Segment | High-level product category. |
| Gender | Target gender segment if available. |
| Country | Market or country associated with the review. |
| Date | Original review publication date. |
| URL | Source product page URL. |
| Crawled_Date | Date when the data was collected. |
[
{
"Product_Id": "143831",
"Review_Id": "1184204469",
"Rating": 5,
"Title": "Love that its sugar free and neutral taste",
"Body": null,
"Source": "ChemistWarehouse",
"Full_Review": "Love that its sugar free and neutral taste: None",
"Product_Name": "Mineralyte Hydrate Hydration Drops Unflavoured 125Ml 124 Serves 2Pack",
"Product_Segment": "Vitamins",
"Gender": "Unisex",
"Country": "Australia",
"Date": "04-05-2025",
"URL": "https://www.chemistwarehouse.com.au/buy/143831/mineralyte-hydrate-hydration-drops-unflavoured-125ml-124-serves-2pack",
"Crawled_Date": "07-09-2025"
}
]
Chemistwarehouse Reviews Spider /
βββ src/
β βββ runner.py
β βββ extractors/
β β βββ reviews_parser.py
β β βββ product_mapper.py
β βββ outputs/
β β βββ json_exporter.py
β βββ config/
β βββ settings.example.json
βββ data/
β βββ inputs.sample.txt
β βββ sample_output.json
βββ requirements.txt
βββ README.md
- Market analysts use it to aggregate customer feedback, so they can identify product strengths and weaknesses.
- E-commerce teams use it to monitor review trends, so they can optimize listings and conversions.
- Product managers use it to study customer sentiment, so they can guide product improvements.
- Data scientists use it to build sentiment models, so they can quantify consumer opinions at scale.
What inputs are required to run this project? You only need a list of Chemist Warehouse product page URLs. The system processes each URL independently and merges results into a single dataset.
Does it support multiple products at once? Yes, it is designed for batch processing and can handle multiple product URLs in one run.
What happens if some reviews are missing fields? The output remains consistent, with unavailable fields set to null to preserve schema stability.
Is the output suitable for analytics tools? Yes, the structured JSON format is designed for direct use in data analysis, dashboards, and machine learning pipelines.
Primary Metric: Processes an average product page with reviews in under 3 seconds.
Reliability Metric: Maintains a success rate above 98% across tested product categories.
Efficiency Metric: Supports high-throughput batch runs with minimal memory overhead.
Quality Metric: Delivers consistently structured records with high field completeness for core review attributes.
