[BACK] Télécharge le siren by cgoudet · Pull Request #49 · dataforgoodfr/13_eclaireur_public

cgoudet · 2025-02-22T12:30:32Z

No description provided.

back/scripts/communities/communities_selector.py

acornet

looking good, but I think we should hold on using polars until we have a very strong reason to do so, and here I don't think we do.

I am a bit worried that mixing pandas and polars is going to make it harder to people to contribute. now they would have to master both.
here I think we could do equally well in pandas, so let's stick to it?

back/scripts/communities/communities_selector.py

back/scripts/communities/loaders/odf.py

acornet · 2025-02-24T14:41:25Z

back/scripts/communities/loaders/ofgl.py

        if data_file.exists():
            self._logger.info("Found OFGL data on disk, loading it.")
-            return pd.read_csv(data_file, sep=";")
+            return pd.read_csv(data_file, sep=";", dtype={"siren": "str"})


shouldn't do this before writing the parquet instead?
we could prob add this to config under communities.ofgl.dtype

This is legacy code. This is precisely why I prefer parquet instead of CSV.

Are you OK for me to change this format?

acornet · 2025-02-24T14:42:28Z

back/scripts/communities/loaders/ofgl.py

+            pd.concat(dataframes, axis=0, ignore_index=True)
+            .astype({"SIREN": str})
+            .assign(
+                SIREN=lambda df: df["SIREN"].str.replace(".0", "").str.zfill(9),


are those .0 really present in the raw data?

back/scripts/datasets/sirene.py

cgoudet · 2025-02-24T22:43:28Z

looking good, but I think we should hold on using polars until we have a very strong reason to do so, and here I don't think we do.

I am a bit worried that mixing pandas and polars is going to make it harder to people to contribute. now they would have to master both. here I think we could do equally well in pandas, so let's stick to it?

The Sirene dataset has 27M lines and the final version takes already 1.5G in memory. I'm not sure that people with low memory will be able pass this step with the memory optimization polars provide.

cgoudet added 27 commits February 19, 2025 22:42

v0

1e223e6

working combined

8e30c3a

include elus

1baaa8f

adapt datagouv searcher to new interface

a4deede

add docstring

3de0890

docstring

7c0445e

force type and properties

4d035a3

add elsu to workflow manager

b2a0e95

Merge branch 'main' into add_elus

5d56be5

migrat datapath

8547e87

Merge branch 'main' into add_elus

0dd8e4d

move and rename next page

6953a74

PR comments

0b731c7

keep elus in memory

bb9361f

use loader factory

788d612

DataGouPAI

d826e12

raise utility class

3ec903a

Merge branch 'main' into add_elus

044fe92

v0 sirene worflow

7ca0875

remove csv and separate download and extraction

969bfc5

add trancheeffectif

6b20642

transform effectif

033aaf9

reformat and correct siren in ofgl

87dbf2c

reformat all_data in communities

bfcd9a2

remove siren loader

2b3b0d9

correct code region

09a510c

rename column

a6dc1cf

cgoudet commented Feb 22, 2025

View reviewed changes

back/scripts/communities/communities_selector.py Outdated Show resolved Hide resolved

cgoudet added 2 commits February 22, 2025 13:33

move siren correction

150b305

read back siren as str

5d646ab

cgoudet marked this pull request as draft February 22, 2025 12:44

acornet changed the base branch from main to add_elus February 24, 2025 14:30

acornet reviewed Feb 24, 2025

View reviewed changes

Base automatically changed from add_elus to main February 24, 2025 21:05

cgoudet added 8 commits February 24, 2025 22:52

Merge branch 'main' into back_add_sirene

8a64802

corrected geoloc

6a18a95

adapt siren to test

e9e72b4

revert sup50

548cdf4

remove politique elu.

14f1cb2

simple config

7ce234f

environent

676dc60

use config for sirene

3977e86

cgoudet and others added 3 commits February 25, 2025 21:53

update stock unitelegal zip

dab0745

force siren typing

1af1f6a

Merge branch 'main' into back_add_sirene

c61a981

cgoudet marked this pull request as ready for review February 25, 2025 21:03

Merge branch 'main' into back_add_sirene

6e0a323

FVarlet approved these changes Feb 27, 2025

View reviewed changes

Merge branch 'main' into back_add_sirene

b10999f

cgoudet merged commit 17e3dc8 into main Feb 27, 2025
2 checks passed

cgoudet deleted the back_add_sirene branch February 27, 2025 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACK] Télécharge le siren#49

[BACK] Télécharge le siren#49
cgoudet merged 42 commits intomainfrom
back_add_sirene

cgoudet commented Feb 22, 2025

Uh oh!

Uh oh!

acornet left a comment

Uh oh!

Uh oh!

Uh oh!

acornet Feb 24, 2025

Uh oh!

cgoudet Feb 24, 2025

Uh oh!

acornet Feb 24, 2025

Uh oh!

cgoudet Feb 24, 2025

Uh oh!

Uh oh!

Uh oh!

cgoudet commented Feb 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cgoudet commented Feb 22, 2025

Uh oh!

Uh oh!

acornet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

acornet Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

cgoudet Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

acornet Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

cgoudet Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cgoudet commented Feb 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants