This scraper extracts essential product data from an e-commerce website selling climate control products, focusing on key attributes for each product. The data is then formatted into a well-structured markdown file, making it easy to integrate into an AI assistant's knowledge base.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for Webshop Product Data Scraper you've just found your team β Let's Chat. ππ
This project scrapes detailed product information from a specified webshop, focusing on relevant data points like product names, descriptions, prices, and categories. It solves the problem of collecting this data in a structured and categorized format, making it suitable for AI training or other automated systems.
This scraper is ideal for anyone in need of structured product data from an e-commerce platform. It simplifies the data extraction process, ensuring that all critical details are collected efficiently and presented in an easily digestible markdown format.
- Enables automated data collection from e-commerce sites, saving time and effort.
- Helps AI assistants access structured product data for improved decision-making or analysis.
- Facilitates better data categorization, which can lead to enhanced user experience and personalization.
| Feature | Description |
|---|---|
| Data Extraction | Scrapes key product data including name, price, description, and category. |
| Markdown Output | Formats extracted data into a structured and readable markdown file. |
| Customizable Fields | Allows for flexible data extraction based on specified product attributes. |
| Automated Crawling | Uses Selenium to automate the web scraping process. |
| Field Name | Field Description |
|---|---|
| product_name | Name of the product being sold. |
| price | Price of the product. |
| description | Detailed description of the product. |
| category | The category under which the product is listed. |
| availability | Stock status of the product. |
| link | URL to the product page for reference. |
[
{
"product_name": "Climate Control Air Conditioner",
"price": "299.99",
"description": "High-efficiency air conditioner with modern features.",
"category": "Air Conditioners",
"availability": "In Stock",
"link": "https://naitec.igkuair.eu/termek-lista/collections/minden-termek/products/climate-control-air-conditioner"
}
]
webshop-product-data-scraper/
βββ src/
β βββ scraper.py
β βββ extractors/
β β βββ product_parser.py
β β βββ utils.py
β βββ config/
β βββ settings.example.json
βββ data/
β βββ inputs.sample.json
β βββ product_data.md
βββ requirements.txt
βββ README.md
- Retailers use it to extract product data from competitor websites, so they can analyze market trends and pricing strategies.
- AI developers use it to gather structured data for training intelligent systems or virtual assistants.
- Data scientists use it to collect detailed e-commerce data for market research or predictive modeling.
Q: How do I run this scraper?
A: Clone the repository, install the required dependencies from requirements.txt, and run the scraper.py file.
Q: Can I modify the fields being scraped?
A: Yes, you can customize the fields by modifying the product_parser.py file to fit your needs.
Q: Is this scraper limited to one webshop? A: Currently, itβs set up for the specified webshop, but you can adapt it to other websites with similar structures by adjusting the scraping logic.
Primary Metric: Average scraping speed of 3 products per second. Reliability Metric: 98% success rate in extracting product data without errors. Efficiency Metric: Optimized to use minimal memory, running on standard server configurations. Quality Metric: Ensures data completeness and accuracy with 99% precision in scraped data fields.
