This project is a Python-based web scraper that automates the process of extracting product details from Amazon's Best Sellers section using Selenium. The script can scrape data from multiple categories, filter products with discounts greater than 50%, and save the results in a structured CSV file.
- Authentication: Logs in to Amazon using user-provided credentials.
- Data Extraction:
- Scrapes details for products in 10 categories.
- Filters products with discounts greater than 50%.
- Handles pagination to extract up to 1500 best-selling products per category.
- Output:
- Saves scraped data in a CSV file for easy access and analysis.
- Error Handling:
- Manages missing elements, timeouts, and empty datasets gracefully.
- Python: Programming language.
- Selenium: For web automation and scraping.
- CSV: For storing the extracted data.
- Python 3.8+ installed on your system.
- Google Chrome browser.
- ChromeDriver: Ensure the version matches your Chrome browser.
- Download ChromeDriver: https://sites.google.com/chromium.org/driver/.
- Install required Python libraries:
pip install selenium
-
Clone the Repository:
git clone https://github.com/your-username/amazon-best-sellers-scraper.git cd amazon-best-sellers-scraper -
Set Up ChromeDriver:
- Download and extract ChromeDriver.
- Update the path to
chromedriver.exein the script:driver = webdriver.Chrome(service=Service("C:\\path\\to\\chromedriver.exe"))
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Script:
python amazon_scraper.py
-
Enter Amazon Credentials:
- You will be prompted to enter your Amazon username and password.
-
Scraper Execution:
- The script will log in to Amazon and scrape product data from the specified categories.
- Data is saved in
output.csvin the same directory as the script.
amazon-best-sellers-scraper/
├── README.md # Project documentation
├── requirements.txt # List of dependencies
├── amazon_scraper.py # Main Python script
├── output/ # Folder for storing scraped data
│ └── output.csv # Example output file
- File:
output.csv - Columns:
- Product Name
- Product Price
- Sale Discount
- Category Name
- Additional product details as per scraping requirements.
| Name | Price | Discount | Category |
|---|---|---|---|
| Product 1 | ₹499 | 55% | Electronics |
| Product 2 | ₹999 | 60% | Kitchen |
-
Empty Output:
- If no products match the filter criteria (e.g., discounts > 50%), the script may produce an empty output.
- Ensure the specified categories have sufficient products meeting the criteria.
-
Two-Factor Authentication (2FA):
- The script does not handle 2FA. If enabled on your Amazon account, you must manually complete the login process.
-
CAPTCHA Challenges:
- If Amazon presents a CAPTCHA, the script cannot bypass it and will require manual intervention.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions are welcome! Please fork the repository and create a pull request for review.
For questions or support, please contact:
- Name: Devendra Mahesh Chaurasiya
- Email: devendrachaurasiya2004@gmail.com
- GitHub: Devendra-Chaurasiya