Extract rich real estate listing data from LeFigaro Immobilier using direct ad URLs (or ad IDs) and turn it into clean, structured datasets for analysis and workflows. This tool focuses on fast, repeatable real estate listings extraction—ideal for price tracking, market research, and building reliable property data pipelines.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for lefigaro-immobilier-mass-products-scraper-by-ads-urls you've just found your team — Let’s Chat. 👆👆
This project takes a list of direct LeFigaro Immobilier listing URLs (or listing IDs) and returns structured information for each property. It solves the common problem of manually collecting listing details at scale by automating extraction into consistent, machine-readable output. It’s built for analysts, growth teams, real-estate researchers, and developers who need dependable property data for dashboards, audits, or enrichment.
- Accepts direct listing links or numeric listing IDs as input.
- Extracts text, pricing, media, and contact-ready details in a consistent schema.
- Handles large input batches with resilient retries and pacing controls.
- Produces clean outputs suitable for spreadsheets, BI tools, and data warehouses.
- Designed to support scalable real estate listings extraction pipelines.
| Feature | Description |
|---|---|
| Direct URL batch input | Provide a list of listing URLs and process them in a single run. |
| ID-based support | Pass listing IDs instead of full URLs for faster input preparation. |
| Rich listing details | Collect titles, descriptions, pricing, media, energy info, and more. |
| Contact & publisher extraction | Capture publisher/agent details and available phone/contact metadata. |
| Nearby transport parsing | Extract nearby transportation details when present on the listing. |
| Export-ready dataset output | Outputs structured data that’s easy to convert to JSON/CSV/HTML. |
| Resilient crawling controls | Built-in retries, throttling, and timeouts for stability. |
| Proxy-ready networking | Supports proxy configuration for improved reliability on high volumes. |
| Field Name | Field Description |
|---|---|
| listingId | Unique identifier of the property listing. |
| url | Canonical URL of the listing page. |
| title | Listing headline/title shown on the page. |
| description | Full textual description of the property. |
| price | Displayed price value (normalized where possible). |
| currency | Currency symbol/code associated with the price. |
| location | Location string (city/area) shown in the listing. |
| address | Address or partial address if available publicly. |
| propertyType | Type of property (apartment, house, studio, etc.). |
| transactionType | Sale/rent classification when available. |
| surfaceArea | Total area (m²) if present. |
| rooms | Number of rooms if available. |
| bedrooms | Number of bedrooms if available. |
| bathrooms | Number of bathrooms if available. |
| floor | Floor number and/or total floors if present. |
| constructionYear | Construction date/year if provided. |
| energyRating | Energy performance rating (e.g., DPE class) when present. |
| emissionsRating | Emissions rating (e.g., GES class) when present. |
| photos | Array of image URLs for the listing gallery. |
| publisherName | Name of the publisher/agent/agency. |
| publisherType | Publisher category (agency/private/other) if detectable. |
| phone | Phone number if publicly displayed. |
| contactMethods | Available contact options (phone/form/email when present). |
| nearbyTransport | Nearby transportation lines/stops parsed from the page. |
| features | Key features/amenities list (elevator, parking, balcony, etc.) when present. |
| scrapedAt | ISO timestamp for when the listing was extracted. |
| raw | Optional raw blocks for debugging/parity checks (disabled by default). |
[
{
"listingId": "75220030",
"url": "https://immobilier.lefigaro.fr/annonces/annonce-75220030.html",
"title": "Appartement 3 pièces — 68 m² — Paris 11e",
"description": "Appartement lumineux avec séjour, cuisine équipée, deux chambres, proche métro et commerces...",
"price": 649000,
"currency": "EUR",
"location": "Paris (75011)",
"address": "Paris 11e (adresse partielle selon disponibilité)",
"propertyType": "apartment",
"transactionType": "sale",
"surfaceArea": 68,
"rooms": 3,
"bedrooms": 2,
"bathrooms": 1,
"floor": "3/6",
"constructionYear": 1978,
"energyRating": "D",
"emissionsRating": "B",
"photos": [
"https://.../photo1.jpg",
"https://.../photo2.jpg",
"https://.../photo3.jpg"
],
"publisherName": "Agence Exemple Immobilier",
"publisherType": "agency",
"phone": "+33XXXXXXXXX",
"contactMethods": ["phone", "contact_form"],
"nearbyTransport": [
{ "type": "metro", "name": "Ligne 9", "stop": "Voltaire" },
{ "type": "bus", "name": "Bus 46", "stop": "Roquette" }
],
"features": ["balcony", "elevator", "cellar"],
"scrapedAt": "2025-12-13T18:10:44.219Z"
}
]
Lefigaro immobilier mass products scraper (by ads URLs)/
├── src/
│ ├── main.py
│ ├── cli.py
│ ├── runner/
│ │ ├── __init__.py
│ │ ├── run_batch.py
│ │ └── validate_input.py
│ ├── crawlers/
│ │ ├── __init__.py
│ │ ├── browser_crawler.py
│ │ └── request_queue.py
│ ├── extractors/
│ │ ├── __init__.py
│ │ ├── listing_parser.py
│ │ ├── media_parser.py
│ │ ├── energy_parser.py
│ │ ├── contact_parser.py
│ │ └── transport_parser.py
│ ├── normalizers/
│ │ ├── __init__.py
│ │ ├── normalize_price.py
│ │ ├── normalize_text.py
│ │ └── normalize_location.py
│ ├── exporters/
│ │ ├── __init__.py
│ │ ├── to_json.py
│ │ ├── to_csv.py
│ │ └── to_html.py
│ ├── config/
│ │ ├── settings.py
│ │ └── settings.example.json
│ └── utils/
│ ├── __init__.py
│ ├── http.py
│ ├── retry.py
│ ├── logger.py
│ └── dates.py
├── data/
│ ├── input.startUrls.sample.json
│ ├── input.ids.sample.txt
│ └── sample.output.json
├── tests/
│ ├── test_validate_input.py
│ ├── test_listing_parser.py
│ ├── test_normalize_price.py
│ └── fixtures/
│ └── listing.sample.html
├── .env.example
├── .gitignore
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md
- Real estate analysts use it to track listing price changes, so they can spot trends and build market reports faster.
- Growth teams use it to compile publisher and listing inventories, so they can identify agencies and prioritize outreach.
- Researchers use it to collect housing data at scale, so they can run statistical studies with consistent inputs.
- Developers use it to feed structured listing data into dashboards, so they can monitor regions and property types in near real-time.
- Investors use it to compare similar listings across areas, so they can validate pricing and evaluate opportunities.
How do I provide inputs—URLs or IDs? You can provide direct listing URLs or numeric listing IDs. If an ID is provided, the tool will build the corresponding listing URL internally and fetch the page the same way.
Does it scrape search results pages too? This project is designed for direct listing pages (items) provided via URLs/IDs. If you need search results extraction, use a separate workflow that first collects item URLs from search pages, then passes those item URLs into this tool.
What happens if some listings are missing fields (phone, energy rating, transport)? The output schema is stable, but optional fields may be null/empty when not publicly displayed. This keeps downstream pipelines reliable without breaking on missing data.
How do I improve stability when processing very large batches? Use conservative concurrency, enable proxy support, and keep retry limits reasonable. For long-running jobs, split inputs into smaller batches and merge outputs afterward for better fault isolation.
Primary Metric: Typical extraction throughput of ~800–1,500 listings/hour depending on media weight and network conditions.
Reliability Metric: 97–99% successful listing completion on clean inputs when retries and pacing are enabled.
Efficiency Metric: Average page processing time ~2.4–4.8 seconds/listing with browser caching and request deduplication enabled.
Quality Metric: 90–98% field completeness on standard listings, with highest variance on optional publisher contact and nearby transport fields.
