Skip to content

abdoujamiinq/massimo-dutti

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 

Repository files navigation

Massimo Dutti Scraper

A focused tool for collecting structured product data from the Massimo Dutti website across countries and languages. It helps teams turn complex product pages into clean, usable datasets for analysis, monitoring, and automation. Built for reliability, speed, and clarity around real-world product information.

Bitbash Banner

Telegram Β  WhatsApp Β  Gmail Β  Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for massimo-dutti you've just found your team β€” Let’s Chat. πŸ‘†πŸ‘†

Introduction

This project extracts detailed product information from Massimo Dutti’s online catalog and organizes it into a consistent, machine-readable format. It removes the manual effort of browsing categories, variants, and product pages one by one. It’s designed for developers, analysts, and e-commerce teams who need accurate fashion product data at scale.

Designed for product-level accuracy

  • Handles full site, category-level, or single product extraction
  • Normalizes color, size, and image variants under one product
  • Works across different country storefronts and languages
  • Produces structured data suitable for JSON, CSV, or analytics pipelines

Features

Feature Description
Full catalog scraping Collects products from the entire website or selected sections
Category targeting Scrape specific product categories with controlled depth
Product-level detail Extracts rich attributes from individual product pages
Variant aggregation Groups colors and sizes into a single product record
Deduplication logic Reduces duplicate products across overlapping categories
Structured output Returns clean, nested data ready for further processing

What Data This Scraper Extracts

Field Name Field Description
id Unique product identifier
name Product name
description Short product description
reference Internal product reference code
price Current product price
oldPrice Previous price if discounted
colors Available color names
sizes Available size labels
category Product category path
images Product image URLs
availabilityDate First availability timestamp
composition Material composition details
care Care and washing instructions
sustainability Sustainability-related attributes
traceability Production and sourcing countries
productPage Direct product page URL

Example Output

[
  {
    "id": 46503392,
    "name": "Russet cotton jacket with pocket details",
    "reference": "06736991-V2025",
    "price": 14900,
    "colors": "Red, Russet",
    "sizes": "10, 12, 14",
    "category": "women/jackets-n1450",
    "productPage": "https://www.massimodutti.com/gb/russet-cotton-jacket-with-pocket-details-l06736991",
    "composition": "100% cotton",
    "availabilityDate": "2025-01-13"
  }
]

Directory Structure Tree

Massimo Dutti/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ runner.py
β”‚   β”œβ”€β”€ extractors/
β”‚   β”‚   β”œβ”€β”€ product_parser.py
β”‚   β”‚   β”œβ”€β”€ category_parser.py
β”‚   β”‚   └── utils_normalize.py
β”‚   β”œβ”€β”€ outputs/
β”‚   β”‚   β”œβ”€β”€ json_writer.py
β”‚   β”‚   └── csv_writer.py
β”‚   └── config/
β”‚       └── settings.example.json
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ sample_input.json
β”‚   └── sample_output.json
β”œβ”€β”€ requirements.txt
└── README.md

Use Cases

  • E-commerce analysts use it to monitor product availability and pricing, so they can track assortment changes over time.
  • Market researchers use it to study fashion trends, so they can analyze materials, colors, and categories at scale.
  • Developers use it to feed product data into internal tools, so they can automate catalog updates.
  • Retail teams use it to audit online listings, so they can ensure consistency across regions.

FAQs

Can I scrape only one product or category? Yes. You can target a single product page, a category page, or the entire site depending on your input configuration.

How are color and size variants handled? All variants are grouped under one product record, making it easier to work with complete product bundles instead of fragmented listings.

Why might the result count be lower than expected? Some pages temporarily expose placeholder products with incomplete data. These are filtered out to maintain data quality.

Is the output suitable for spreadsheets? Yes. Key fields are flattened for easy export, while detailed variant data remains available in structured form.


Performance Benchmarks and Results

Primary Metric: Processes roughly 1,000 products in about 5 minutes under normal conditions.

Reliability Metric: High completion rate with stable retries when temporary access blocks occur.

Efficiency Metric: Optimized data transfer minimizes bandwidth usage and runtime costs.

Quality Metric: Returns clean, deduplicated product records with consistent field naming and structure.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
β˜…β˜…β˜…β˜…β˜…

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
β˜…β˜…β˜…β˜…β˜…

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
β˜…β˜…β˜…β˜…β˜…

Releases

No releases published

Packages

No packages published