This project contains a collection of scrapers designed to fetch GPU pricing information from various cloud providers. The scrapers extract data from web pages and store it in a MongoDB database for further analysis.
- main.py: The main script to run all the scrapers.
- scraper/: Directory containing the base scraper class and individual scraper implementations for different cloud providers.
- init.py: Initializes the scraper module.
- scraper.py: Contains the base
scraperclass for MongoDB interaction and web driver setup. - aws.py: Scraper for AWS EC2 pricing.
- azure.py: Scraper for Azure GPU pricing.
- coreweave.py: Scraper for Coreweave GPU pricing.
- lambda.py: Scraper for Lambda Labs GPU pricing.
- lightning.py: Scraper for Lightning GPU pricing.
- nebius.py: Scraper for Nebius GPU pricing.
- runpod.py: Scraper for Runpod GPU pricing.
- tencent.py: Scraper for Tencent GPU pricing.
-
Install Dependencies: Ensure you have the necessary dependencies installed. You can use
pipto install them. pip install -r requirements.txt -
Setup MongoDB: Make sure you have MongoDB running and accessible at mongodb://localhost:27017/.
-
Run the Scrapers: Execute the main script to run all the scrapers and store the data in MongoDB.
AWS Credentials: Ensure you have AWS credentials set up if you are using the AWS EC2 scraper. Chrome Driver: The base Scraper class sets up a headless Chrome driver. Make sure you have Chrome installed and the ChromeDriver properly set up.