-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Improve reproducibility and templating using mise #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…in.tf and variables.tf)
…der + create Prefect blocks)
…infra:sync_vars task
…st flow to use new block names
…le in favor of scripts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces mise as a zero-friction automation tool to standardize development workflows and improve reproducibility across the repository. It replaces the previous Python-based infrastructure/setup_profiles/ module with simpler, more maintainable scripts that leverage Jinja2 templating for dynamic configuration generation.
Key Changes:
- Implemented
misetask runner with comprehensive task definitions for infrastructure, configuration rendering, and GCP operations - Added template-based configuration system using Jinja2 for
dbtprofiles andprefectconfigs - Created automation scripts for environment synchronization, template rendering, Prefect block setup, and Git metadata extraction
Reviewed changes
Copilot reviewed 27 out of 29 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
mise.toml |
Defines tools, tasks, and environment variables for the mise-based workflow |
scripts/sync_env |
Python script to synchronize .env files from .env.example, preserving user values |
scripts/render_template |
Jinja2 template renderer that injects env vars and Terraform outputs |
scripts/setup_prefect_blocks.py |
Automated Prefect block creation from rendered dbt profiles |
scripts/get_git_env.sh |
Bash script to export Git metadata as environment variables |
dbt/profiles.tpl.yml |
Jinja2 template for dbt profiles with environment-based configuration |
prefect.tpl.yml |
Jinja2 template for Prefect deployment configuration |
dbt/profiles.yml |
Generated dbt profiles file (should be git-ignored) |
prefect.yml |
Generated Prefect configuration (should be git-ignored) |
infrastructure/variables.tf |
Updated Terraform variables to use suffix-based naming |
infrastructure/main.tf |
Refactored to use local variables for dataset IDs |
infrastructure/providers.tf |
Extracted provider configuration to separate file |
infrastructure/outputs.tf |
Simplified outputs, removed redundant values |
.env.example |
Added template for required environment variables |
README.md |
Updated with mise-based workflow instructions |
infrastructure/README.md |
Comprehensive documentation of automated and manual workflows |
Files not reviewed (1)
- infrastructure/.terraform.lock.hcl: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… from environment variables
…modify save flag default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 26 out of 28 changed files in this pull request and generated 13 comments.
Files not reviewed (1)
- infrastructure/.terraform.lock.hcl: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,204 @@ | |||
| #!/usr/bin/env -S uv run --script | |||
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requires-python line is missing the closing triple slash comment marker # ///. The inline PEP 723 script metadata format requires both opening # /// script and closing # /// markers.
| #!/usr/bin/env -S uv run --script | |
| #!/usr/bin/env -S uv run --script | |
| # /// script |
| # branch or short ref | ||
| GIT_BRANCH=$(_safe git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "") | ||
| if [ "$GIT_BRANCH" = "HEAD" ]; then | ||
| GIT_BRANCH="$(git describe --all --always 2>/dev/null || echo "${GIT_COMMIT_SHORT}")" |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 63, git describe is not wrapped with the _safe function, unlike other git commands. If this command fails (e.g., no git history), it could cause the script to exit due to set -euo pipefail. Wrap this with _safe for consistency: GIT_BRANCH="$(_safe git describe --all --always || echo "${GIT_COMMIT_SHORT}")".
| GIT_BRANCH="$(git describe --all --always 2>/dev/null || echo "${GIT_COMMIT_SHORT}")" | |
| GIT_BRANCH="$(_safe git describe --all --always || echo "${GIT_COMMIT_SHORT}")" |
| } | ||
| } | ||
|
|
||
| provider "google" {} No newline at end of file |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provider "google" block is missing required configuration. Without explicit project and region arguments, Terraform will rely on environment variables or default credentials, which may lead to unexpected behavior. Consider adding: project = var.project_id and region = var.region to ensure explicit configuration.
| provider "google" {} | |
| provider "google" { | |
| project = var.project_id | |
| region = var.region | |
| } |
| 5. **Plan Infrastructure:** `mise run infra:plan` shows the changes that will be applied. | ||
| 6. **Apply Infrastructure:** | ||
| ```bash | ||
| tofu -chdir=infrastructure apply |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation instructs users to run tofu -chdir=infrastructure apply directly in step 6, but this bypasses the mise task that sets up the required environment variables (TF_VAR_region, TF_VAR_project_id, TF_VAR_sa_dbt_id). This should be mise run infra:apply to ensure consistency with the automated approach.
| tofu -chdir=infrastructure apply | |
| mise run infra:apply |
| if value is None: | ||
| value = None |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if value is None: value = None on lines 90-91 is redundant and has no effect. This should be removed or the logic should be clarified if a different check was intended.
| if value is None: | |
| value = None |
| @@ -6,24 +6,4 @@ output "bq_dev_dataset_id" { | |||
| output "bq_prod_dataset_id" { | |||
| description = "BigQuery prod dataset ID" | |||
| value = google_bigquery_dataset.prod_dataset.dataset_id | |||
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several terraform outputs have been removed (sa_email, project_id, region, sa_key_path) that are still referenced in the template files (e.g., dbt/profiles.tpl.yml uses tf.project_id, tf.region, tf.sa_key_path). While these have fallback environment variables, the templates may fail to render correctly if both terraform outputs and environment variables are missing. Consider adding these outputs back or documenting the required environment variables.
| value = google_bigquery_dataset.prod_dataset.dataset_id | |
| value = google_bigquery_dataset.prod_dataset.dataset_id | |
| } | |
| output "sa_email" { | |
| description = "Service account email" | |
| value = google_service_account.sa.email | |
| } | |
| output "project_id" { | |
| description = "GCP project ID" | |
| value = var.project_id | |
| } | |
| output "region" { | |
| description = "GCP region" | |
| value = var.region | |
| } | |
| output "sa_key_path" { | |
| description = "Path to the service account key file" | |
| value = local.sa_key_path |
| GCLOUD_PROJECT= | ||
| # list for available regions: https://cloud.google.com/about/locations ; auto : https://cloud.withgoogle.com/region-picker/ | ||
| GCLOUD_REGION= | ||
| PREFECT_API_URL= | ||
| PREFECT_API_KEY= No newline at end of file |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .env.example file is missing several environment variables that are referenced in the templates and scripts, including GCP_SA_KEY_PATH, BQ_DEV_DATASET, BQ_PROD_DATASET, and potentially others. Add these missing variables with appropriate comments to guide users in setting up their environment correctly.
| # TF_LOG = { value = "DEBUG" } | ||
| PREFECT_PROFILES_PATH = { value = "profiles.toml" } | ||
|
|
||
| GCLOUD_SA_DBT_ID = { value = "dbt-sa" } |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for GCLOUD_SA_DBT_ID is "dbt-sa" in mise.toml, but the corresponding Terraform variable sa_dbt_id has a default of "projet-bi-dbt-sa" in infrastructure/variables.tf. This inconsistency could lead to mismatches between the service account created by Terraform and the one expected by other scripts. Align these default values or document the intentional difference.
| GCLOUD_SA_DBT_ID = { value = "dbt-sa" } | |
| GCLOUD_SA_DBT_ID = { value = "projet-bi-dbt-sa" } |
| # requires-python = ">=3.10" | ||
| # dependencies = [ | ||
| # "python-dotenv", | ||
| # "click", |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dependency 'click' is listed in the script dependencies but is never imported or used. This should be removed to avoid unnecessarily installing unused packages.
| # "click", |
| @@ -0,0 +1,262 @@ | |||
| #!/usr/bin/env -S uv run --script | |||
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requires-python line is missing the closing triple slash comment marker # ///. The inline PEP 723 script metadata format requires both opening # /// script and closing # /// markers.
| #!/usr/bin/env -S uv run --script | |
| #!/usr/bin/env -S uv run --script | |
| # /// script |
* feat: Setup Prefect and IaC config to support prod environment * hotfix: change Prefect deployment shell script
…profiles and pipeline flows
…iptions in dbt YAML files
* chore: update asset_base path in dbt configuration and adjust file copy commands in GitHub Actions workflow * chore: update asset paths in dbt documentation and GitHub Actions workflow for consistency * chore: standardize asset paths in dbt documentation workflow and update configuration for clarity * chore: correct asset path in fct__viewings * chore: minor changes * chore: update asset_base path in erd_config.yml for consistency * chore: update IAM role for storage admin and adjust seed script parameters for data generation * feat: enhance data ingestion process by adding ingestion_date to models and updating GCS paths for Hive partitioning * fix: correct release date calculation in seed_script to use a 30-day interval for seasons * feat: implement snapshot strategy in models and update external table definitions for ingestion_date partitioning
…solete test image
…raphs, references)
Description
This PR implements a "zero-friction" repository setup using
miseto automate dependency management, environment configuration, and infrastructure tasks. It addresses the need for reproducible environments and dynamic configuration fordbtandprefect.Related Issue
Closes #11
Key Changes
Mise Integration (
mise.toml):uv,opentofu,gcloud) and tasks to standardize development workflows.infra:init,infra:plan,infra:outputs.dbt:render_profiles,prefect:render_configs.gcloud:auth,gcloud:create_sa_key.Scripts & Automation (
scripts/):sync_env: Automates.envcreation from.env.example, preserving existing values.render_template: A Jinja2-based renderer that injects environment variables and Terraform outputs into configuration files.setup_prefect_blocks.py: Automates the creation of Prefect blocks (GCP credentials, BigQuery targets).get_git_env.sh: Exports git metadata as environment variables for use in templates.Templating:
dbt/profiles.tpl.yml: Dynamic dbt profile template supporting dev/prod environments via env vars or Terraform outputs.prefect.tpl.yml: Dynamic Prefect configuration template.Infrastructure:
infrastructure/to align with the new directory structure and outputs.profiles.tomlfor Prefect profile management.Documentation:
README.mdwith instructions on usingmiseanduvfor local development.Verification Steps
Impact
miseinstalled to work with the repo effectively.dbt/profiles.ymlandprefect.ymlare now generated artifacts and should not be manually edited (they are git-ignored).