Compare products from Amazon, Flipkart, Myntra, and Meesho in seconds.
Built for speed, stealth, and scalability.
This project is a high-performance, asynchronous web-scraping engine. It scrapes multiple e-commerce platforms in parallel using Playwright (browser automation) and serves results via a FastAPI backend.
Designed for developers who need robust data extraction without the headache of managing proxies, captchas, or basic DOM parsing.
- NLP-Powered Search: Uses
all-MiniLM-L6-v2to understand user queries and match products more accurately. - Smart Caching: SQLite-based caching with Time-To-Live (TTL) mechanisms. Cached queries return instantly (<50ms).
- Session Management: Handles multiple user sessions with query prioritization.
- Async First: Built on
asyncioandaiohttpfor non-blocking operations. - Concurrency Control: Implements Semaphores to manage load and rate limits, preventing IP bans.
- Stealth Mode: Uses headless browser behaviors to mimic real users, bypassing standard bot detection.
- Supported Platforms:
- 🛒 Amazon
- 🛍️ Flipkart
- 👗 Myntra
- 📦 Meesho
- Responsive Frontend: Clean, dark-themed interface built with Vanilla JS & CSS.
- Advanced Filtering: Filter by price, platform, and sort options.
| Component | Technology |
|---|---|
| Backend | Python 3.10+, FastAPI, Uvicorn |
| Scraping | Playwright, BS4, selenium, Aiohttp |
| Database | aiosqlite3 |
| ML/NLP | Sentence-Transformers |
| Frontend | HTML5, CSS3, JavaScript |
-
Clone the repository
git clone https://github.com/Aniket-16-S/product-Sraper.git cd product-Sraper -
Install Python Dependencies
pip install -r requirements.txt
-
Install Playwright Browsers
playwright install chromium
Start the API server (Backend + Frontend served statically):
python api.py- Frontend: Open
http://localhost:8000in your browser. - API Docs: Explore endpoints at
http://localhost:8000/docs.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/search?q={query} |
Search for products across all platforms. |
GET |
/api/admin/stats |
View cache hit rates and stored product counts. |
POST |
/api/admin/clear |
Flush all cached data. |
POST |
/api/admin/ttl |
Set cache Time-To-Live (TTL). |
This tool is created for educational and research purposes only.
- Respect the
robots.txtof all target websites. - Do not use this tool for high-frequency scraping that degrades service for others.
- The authors are not responsible for any misuse of this software.
Contributions are welcome! Please feel free to submit a Pull Request.
Crafted with ❤️ by Aniket


