[BACK] Ajoute le dataset des élus by cgoudet · Pull Request #43 · dataforgoodfr/13_eclaireur_public

cgoudet · 2025-02-19T22:25:17Z

https://noco.services.dataforgood.fr/dashboard/#/nc/p6lbzxq9ra31no8/m6jcs2djf6sm3ee/Vue%20publique?rowId=36

Chaque resource de ce dataset contient un type d'élu différent (maire, sénateur, député, ...)
La page du dataset permet d'obtenir tous les liens des différents resources, qui sont alors individuellement téléchargés et transformées en parquet.
Les différents datasets sont finalement combinés en un unique dataset.
L'interaction avec l'API data.gouv a été reformat pour réutiliser la logique de datagouvsearcher dans EluWorflow.

acornet

trop merci :) quelques remarques.

back/scripts/utils/datagouv_api.py

acornet · 2025-02-21T13:20:49Z

back/scripts/utils/datagouv_api.py

+    """
+    Fetch information about all resources of a given dataset.
+    """
+    save_filename = (savedir or Path(".")) / f"dataset_{dataset_id}.parquet"


open question for later: how do we handle udpates (new files)?

I think we'll need at system of flags both global and step specific to force the download again.

For this file it's easy we can just remove the old files but for some other content we may want to create a combination of old and new content. Indeed this will need a discussion later on.

back/scripts/utils/datagouv_api.py

acornet · 2025-02-21T13:34:43Z

back/scripts/workflow/workflow_manager.py

+
+    def _run_elus(self):
+        elus = ElusWorkflow(self.source_folder)
+        elus.fetch_raw_datasets()


strange that the client needs to worry about this IMO

Do you mean that should not be the responsibility of the worflow manager or that it should only call .run() and hide the 2 steps in there?

I corrected for point 2.

back/scripts/utils/politiques_elus.py

acornet · 2025-02-21T13:43:23Z

back/scripts/utils/politiques_elus.py

+        resources = dataset_resources(self.dataset_id, savedir=self.data_folder)
+        combined = []
+        for _, resource in tqdm(resources.iterrows()):
+            df = pd.read_parquet(self.raw_data_folder / f"{resource['resource_id']}.parquet")


I think the path of the resource is part of fetch_raw_datasets's API, so it looks like it should return an array of file paths? having both fetch_raw_datasets and it client re-compute the same path seems wrong.
also, it seems wrong to read again something that we had in memory, unless we are dealing with a crazy scale. did you do this to reduce the memory footprint? what's the scale here?

In this case it was a bit overkill to write an re-read as we concat through pandas anyway.
I reformatted by keeping sub-datasets in memory.

acornet

LGTM, thanks! let's just add paths to config, plus some minor nits.

back/scripts/datasets/datagouv_searcher.py

back/scripts/utils/datagouv_api.py

back/scripts/utils/politiques_elus.py

cgoudet added 11 commits February 19, 2025 22:42

v0

1e223e6

working combined

8e30c3a

include elus

1baaa8f

adapt datagouv searcher to new interface

a4deede

add docstring

3de0890

docstring

7c0445e

force type and properties

4d035a3

add elsu to workflow manager

b2a0e95

Merge branch 'main' into add_elus

5d56be5

migrat datapath

8547e87

Merge branch 'main' into add_elus

0dd8e4d

acornet reviewed Feb 21, 2025

View reviewed changes

cgoudet added 7 commits February 21, 2025 22:39

move and rename next page

6953a74

PR comments

0b731c7

keep elus in memory

bb9361f

use loader factory

788d612

DataGouPAI

d826e12

raise utility class

3ec903a

Merge branch 'main' into add_elus

044fe92

acornet approved these changes Feb 24, 2025

View reviewed changes

cgoudet added 8 commits February 24, 2025 21:16

rename elected officials

2aee7e7

create savedir

f484514

config paths

366df25

Merge branch 'main' into add_elus

05f10d0

remove duplicate call

df06970

put back main changes

a2607e6

update test config

bd10bd5

adapt config

f3adec4

cgoudet merged commit 3bdd915 into main Feb 24, 2025
2 checks passed

cgoudet deleted the add_elus branch February 24, 2025 21:05

Conversation

cgoudet commented Feb 19, 2025

Uh oh!

acornet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

acornet Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

cgoudet Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

acornet Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

cgoudet Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

cgoudet Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

acornet Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

cgoudet Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

acornet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cgoudet Feb 21, 2025 •

edited

Loading