- Install MongoDB & MySQL
Script: load_data.py
Scrape information of all products on the Tiki website
- Send
requesttoTiki web APIsto get product information - Use
time.sleepto alternate pauses after 50 and 100 requests, avoiding IP blockage. - Insert scraped data directly to the
productcollection within thetikiMongoDB database - output: sample_output
Script: migrate_data.py
Migrate specific data fields from MongoDB to MySQL for further use and analysis
- Create the
product_datatable within thetiki_productdatabase - Set up the metadata for the
product_datatable - Get these fields from each document in
productcollection and insert intoproduct_datatable in MySQL:id,name,category_id,category_name,subcategory_id,subcategory_name,short_description,description,url,price,rating,quantity_sold,origin - Use
BeautifulSoupto removehtml tagsindescriptionfield before insert into MySQL - Output: sample_output
Script: extract_data.py
Extract product_id and ingredient information in the product's description for product development team to use
- Find all documents that have the string pattern
thành phần:and extract ingredient data afterthành phần: - Use
BeautifulSoupto removehtml tagsindescription - Output: sample_output
Script: analyze_data.py
Create visualizations for better understanding of product
