Skip to content

fix(module 3 web_to_gcs.py): add type enforcement in csv reading (like in module 1)#789

Open
MichaelG-create wants to merge 1 commit intoDataTalksClub:mainfrom
MichaelG-create:main
Open

fix(module 3 web_to_gcs.py): add type enforcement in csv reading (like in module 1)#789
MichaelG-create wants to merge 1 commit intoDataTalksClub:mainfrom
MichaelG-create:main

Conversation

@MichaelG-create
Copy link

primary:

  • add type enforcement in web_to_gcs.py so parquet columns have good types before upload to GCS (avoid to handle it in BQ)

secondary:

  • add .env conf file to avoid hardcoding env vars and load with dotenv() lib

Create a variant web_to_gcs_with_progrers_bar.py with

  • progress bar when DL, parquet conversion, and upload to GCS
  • no DL, conversion, reupload if file already processed (in case something goes wrong during the process)

(Could have done it as a Kestra flow)

…d progress bar, and no reupload try if present

- add type enforcement to web_to_gcs.py (like in ingest.py in module 1) - add .env var handling to avoid manual EXPORT - create a new version of web_to_gcs with progress bar, no dl or reupload if already done (in case internet connection or anything fails)

- fix types not enforced to parquet columns, leading to do it in BQ later when creating materialized tables
@MichaelG-create
Copy link
Author

(the force push was to amend the commit (forgot to update README before sending PR))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant