Skip to content

Scrapping products from well known e-com. sites like Amazon, Flipkart and Myntra. This tool allows to scrape and compare the products with information like price, delivery, image, company, revirews etc.

Notifications You must be signed in to change notification settings

Aniket-16-S/Product-Scraper

Repository files navigation

⚡ AI-Powered Multi-Platform Product Search

Compare products from Amazon, Flipkart, Myntra, and Meesho in seconds.
Built for speed, stealth, and scalability.


🚀 Overview

This project is a high-performance, asynchronous web-scraping engine. It scrapes multiple e-commerce platforms in parallel using Playwright (browser automation) and serves results via a FastAPI backend.

Designed for developers who need robust data extraction without the headache of managing proxies, captchas, or basic DOM parsing.


   


✨ Key Features

🧠 Intelligent Core

  • NLP-Powered Search: Uses all-MiniLM-L6-v2 to understand user queries and match products more accurately.
  • Smart Caching: SQLite-based caching with Time-To-Live (TTL) mechanisms. Cached queries return instantly (<50ms).
  • Session Management: Handles multiple user sessions with query prioritization.

🚄 High-Performance Architecture

  • Async First: Built on asyncio and aiohttp for non-blocking operations.
  • Concurrency Control: Implements Semaphores to manage load and rate limits, preventing IP bans.
  • Stealth Mode: Uses headless browser behaviors to mimic real users, bypassing standard bot detection.

🌐 Universal Coverage

  • Supported Platforms:
    • 🛒 Amazon
    • 🛍️ Flipkart
    • 👗 Myntra
    • 📦 Meesho

🎨 Modern UI

  • Responsive Frontend: Clean, dark-themed interface built with Vanilla JS & CSS.
  • Advanced Filtering: Filter by price, platform, and sort options.

🛠️ Tech Stack

Component Technology
Backend Python 3.10+, FastAPI, Uvicorn
Scraping Playwright, BS4, selenium, Aiohttp
Database aiosqlite3
ML/NLP Sentence-Transformers
Frontend HTML5, CSS3, JavaScript

⚡ Quick Start

Installation

  1. Clone the repository

    git clone https://github.com/Aniket-16-S/product-Sraper.git
    cd product-Sraper
  2. Install Python Dependencies

    pip install -r requirements.txt
  3. Install Playwright Browsers

    playwright install chromium

Running the Application

Start the API server (Backend + Frontend served statically):

python api.py
  • Frontend: Open http://localhost:8000 in your browser.
  • API Docs: Explore endpoints at http://localhost:8000/docs.

🔌 API Endpoints

Method Endpoint Description
GET /api/search?q={query} Search for products across all platforms.
GET /api/admin/stats View cache hit rates and stored product counts.
POST /api/admin/clear Flush all cached data.
POST /api/admin/ttl Set cache Time-To-Live (TTL).

⚠️ Legal Disclaimer

This tool is created for educational and research purposes only.

  • Respect the robots.txt of all target websites.
  • Do not use this tool for high-frequency scraping that degrades service for others.
  • The authors are not responsible for any misuse of this software.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Crafted with ❤️ by Aniket

About

Scrapping products from well known e-com. sites like Amazon, Flipkart and Myntra. This tool allows to scrape and compare the products with information like price, delivery, image, company, revirews etc.

Topics

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published