Disclaimer: This is a personal project used for educational/didatic purposes only
This project leverages Python's Scrapy library to perform web scraping on Mercado Livre, specifically collecting information about 5-string bass guitars.
If you'd like to scrape data for a different item, it is totally possible!
extraction/spiders/mercadolivre.pySet it to the item you wish to scrape If you wish to scrape prices for Acer notebooks, the url would be
https://lista.mercadolivre.com.br/notebook-acerClick the "Next Page" button and observe the new URL. It should look like
https://lista.mercadolivre.com.br/informatica/portateis-acessorios/notebooks/acer/notebook-acer_Desde_49_NoIndex_TrueSet this url as the next page attribute in the MercadoLivreSpider class, but change 49 to {offset}
This will ensure that the crawler moves through to the next pages
In the end, the code for the next page attribute should look like
next_page = f"https://lista.mercadolivre.com.br/instrumentos-musicais/instrumentos-corda/baixos/baixo-5-cordas_Desde_{offset}_NoIndex_True_STRINGS*NUMBER_5-5"The dashboard currently looks like this:
You may search through all items and apply filters
I have refactored the project to offer Docker support
You may install it with the following commands:
docker build -t mlscrape .docker run -p 8501:8501 mlscrapeThis will map your 8501 port to the one exposed on the Dockerfile
You will be able to access the dashboard by navigating to localhost:8501
It is my personal recommendation that you make a new virtual Python environment for every project you run. To do so, open your preferred terminal and run the commands:
git clone https://github.com/heitornolla/mercadolivre-scraping.gitcd mercadolivre-scrapingpyenv local 3.12.1You can do this with Venv with the command
python -m venv .venvor use other environment managers, such as Conda
If you opted for Venv, activate the environment with
source .venv/Scripts/activatepip install -r requirements.txtRun the crawl.py file to crawl Mercado Livre
To generate the dashboard based on your data, run
streamlit run dashboard/dashboard.py