Skip to content

Conversation

@Bafbi
Copy link
Member

@Bafbi Bafbi commented Nov 28, 2025

Description

This PR implements a "zero-friction" repository setup using mise to automate dependency management, environment configuration, and infrastructure tasks. It addresses the need for reproducible environments and dynamic configuration for dbt and prefect.

Related Issue

Closes #11

Key Changes

  • Mise Integration (mise.toml):

    • Defined tools (uv, opentofu, gcloud) and tasks to standardize development workflows.
    • Added tasks for infrastructure management: infra:init, infra:plan, infra:outputs.
    • Added tasks for configuration rendering: dbt:render_profiles, prefect:render_configs.
    • Added tasks for GCP auth and setup: gcloud:auth, gcloud:create_sa_key.
  • Scripts & Automation (scripts/):

    • sync_env: Automates .env creation from .env.example, preserving existing values.
    • render_template: A Jinja2-based renderer that injects environment variables and Terraform outputs into configuration files.
    • setup_prefect_blocks.py: Automates the creation of Prefect blocks (GCP credentials, BigQuery targets).
    • get_git_env.sh: Exports git metadata as environment variables for use in templates.
  • Templating:

    • dbt/profiles.tpl.yml: Dynamic dbt profile template supporting dev/prod environments via env vars or Terraform outputs.
    • prefect.tpl.yml: Dynamic Prefect configuration template.
  • Infrastructure:

    • Updated Terraform configuration in infrastructure/ to align with the new directory structure and outputs.
    • Added profiles.toml for Prefect profile management.
  • Documentation:

    • Updated README.md with instructions on using mise and uv for local development.

Verification Steps

  1. Setup Environment:
    mise install
    mise run sync_env
  2. Infrastructure (Optional/Dry-run):
    mise run infra:init
    mise run infra:plan
  3. Render Configurations:
    mise run dbt:render_profiles
    # Check dbt/profiles.yml content
    mise run prefect:render_configs
    # Check prefect.yml content
  4. Run Tests:
    uv run prefect_flows/test.py

Impact

  • Developers now need mise installed to work with the repo effectively.
  • dbt/profiles.yml and prefect.yml are now generated artifacts and should not be manually edited (they are git-ignored).

Bafbi added 18 commits November 17, 2025 12:28
@Bafbi Bafbi linked an issue Nov 28, 2025 that may be closed by this pull request
@Bafbi Bafbi added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces mise as a zero-friction automation tool to standardize development workflows and improve reproducibility across the repository. It replaces the previous Python-based infrastructure/setup_profiles/ module with simpler, more maintainable scripts that leverage Jinja2 templating for dynamic configuration generation.

Key Changes:

  • Implemented mise task runner with comprehensive task definitions for infrastructure, configuration rendering, and GCP operations
  • Added template-based configuration system using Jinja2 for dbt profiles and prefect configs
  • Created automation scripts for environment synchronization, template rendering, Prefect block setup, and Git metadata extraction

Reviewed changes

Copilot reviewed 27 out of 29 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
mise.toml Defines tools, tasks, and environment variables for the mise-based workflow
scripts/sync_env Python script to synchronize .env files from .env.example, preserving user values
scripts/render_template Jinja2 template renderer that injects env vars and Terraform outputs
scripts/setup_prefect_blocks.py Automated Prefect block creation from rendered dbt profiles
scripts/get_git_env.sh Bash script to export Git metadata as environment variables
dbt/profiles.tpl.yml Jinja2 template for dbt profiles with environment-based configuration
prefect.tpl.yml Jinja2 template for Prefect deployment configuration
dbt/profiles.yml Generated dbt profiles file (should be git-ignored)
prefect.yml Generated Prefect configuration (should be git-ignored)
infrastructure/variables.tf Updated Terraform variables to use suffix-based naming
infrastructure/main.tf Refactored to use local variables for dataset IDs
infrastructure/providers.tf Extracted provider configuration to separate file
infrastructure/outputs.tf Simplified outputs, removed redundant values
.env.example Added template for required environment variables
README.md Updated with mise-based workflow instructions
infrastructure/README.md Comprehensive documentation of automated and manual workflows
Files not reviewed (1)
  • infrastructure/.terraform.lock.hcl: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Bafbi and others added 2 commits November 28, 2025 22:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 28 changed files in this pull request and generated 13 comments.

Files not reviewed (1)
  • infrastructure/.terraform.lock.hcl: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,204 @@
#!/usr/bin/env -S uv run --script
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requires-python line is missing the closing triple slash comment marker # ///. The inline PEP 723 script metadata format requires both opening # /// script and closing # /// markers.

Suggested change
#!/usr/bin/env -S uv run --script
#!/usr/bin/env -S uv run --script
# /// script

Copilot uses AI. Check for mistakes.
# branch or short ref
GIT_BRANCH=$(_safe git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
if [ "$GIT_BRANCH" = "HEAD" ]; then
GIT_BRANCH="$(git describe --all --always 2>/dev/null || echo "${GIT_COMMIT_SHORT}")"
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 63, git describe is not wrapped with the _safe function, unlike other git commands. If this command fails (e.g., no git history), it could cause the script to exit due to set -euo pipefail. Wrap this with _safe for consistency: GIT_BRANCH="$(_safe git describe --all --always || echo "${GIT_COMMIT_SHORT}")".

Suggested change
GIT_BRANCH="$(git describe --all --always 2>/dev/null || echo "${GIT_COMMIT_SHORT}")"
GIT_BRANCH="$(_safe git describe --all --always || echo "${GIT_COMMIT_SHORT}")"

Copilot uses AI. Check for mistakes.
}
}

provider "google" {} No newline at end of file
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provider "google" block is missing required configuration. Without explicit project and region arguments, Terraform will rely on environment variables or default credentials, which may lead to unexpected behavior. Consider adding: project = var.project_id and region = var.region to ensure explicit configuration.

Suggested change
provider "google" {}
provider "google" {
project = var.project_id
region = var.region
}

Copilot uses AI. Check for mistakes.
5. **Plan Infrastructure:** `mise run infra:plan` shows the changes that will be applied.
6. **Apply Infrastructure:**
```bash
tofu -chdir=infrastructure apply
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation instructs users to run tofu -chdir=infrastructure apply directly in step 6, but this bypasses the mise task that sets up the required environment variables (TF_VAR_region, TF_VAR_project_id, TF_VAR_sa_dbt_id). This should be mise run infra:apply to ensure consistency with the automated approach.

Suggested change
tofu -chdir=infrastructure apply
mise run infra:apply

Copilot uses AI. Check for mistakes.
Comment on lines +90 to +91
if value is None:
value = None
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if value is None: value = None on lines 90-91 is redundant and has no effect. This should be removed or the logic should be clarified if a different check was intended.

Suggested change
if value is None:
value = None

Copilot uses AI. Check for mistakes.
@@ -6,24 +6,4 @@ output "bq_dev_dataset_id" {
output "bq_prod_dataset_id" {
description = "BigQuery prod dataset ID"
value = google_bigquery_dataset.prod_dataset.dataset_id
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several terraform outputs have been removed (sa_email, project_id, region, sa_key_path) that are still referenced in the template files (e.g., dbt/profiles.tpl.yml uses tf.project_id, tf.region, tf.sa_key_path). While these have fallback environment variables, the templates may fail to render correctly if both terraform outputs and environment variables are missing. Consider adding these outputs back or documenting the required environment variables.

Suggested change
value = google_bigquery_dataset.prod_dataset.dataset_id
value = google_bigquery_dataset.prod_dataset.dataset_id
}
output "sa_email" {
description = "Service account email"
value = google_service_account.sa.email
}
output "project_id" {
description = "GCP project ID"
value = var.project_id
}
output "region" {
description = "GCP region"
value = var.region
}
output "sa_key_path" {
description = "Path to the service account key file"
value = local.sa_key_path

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
GCLOUD_PROJECT=
# list for available regions: https://cloud.google.com/about/locations ; auto : https://cloud.withgoogle.com/region-picker/
GCLOUD_REGION=
PREFECT_API_URL=
PREFECT_API_KEY= No newline at end of file
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .env.example file is missing several environment variables that are referenced in the templates and scripts, including GCP_SA_KEY_PATH, BQ_DEV_DATASET, BQ_PROD_DATASET, and potentially others. Add these missing variables with appropriate comments to guide users in setting up their environment correctly.

Copilot uses AI. Check for mistakes.
# TF_LOG = { value = "DEBUG" }
PREFECT_PROFILES_PATH = { value = "profiles.toml" }

GCLOUD_SA_DBT_ID = { value = "dbt-sa" }
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for GCLOUD_SA_DBT_ID is "dbt-sa" in mise.toml, but the corresponding Terraform variable sa_dbt_id has a default of "projet-bi-dbt-sa" in infrastructure/variables.tf. This inconsistency could lead to mismatches between the service account created by Terraform and the one expected by other scripts. Align these default values or document the intentional difference.

Suggested change
GCLOUD_SA_DBT_ID = { value = "dbt-sa" }
GCLOUD_SA_DBT_ID = { value = "projet-bi-dbt-sa" }

Copilot uses AI. Check for mistakes.
# requires-python = ">=3.10"
# dependencies = [
# "python-dotenv",
# "click",
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependency 'click' is listed in the script dependencies but is never imported or used. This should be removed to avoid unnecessarily installing unused packages.

Suggested change
# "click",

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,262 @@
#!/usr/bin/env -S uv run --script
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requires-python line is missing the closing triple slash comment marker # ///. The inline PEP 723 script metadata format requires both opening # /// script and closing # /// markers.

Suggested change
#!/usr/bin/env -S uv run --script
#!/usr/bin/env -S uv run --script
# /// script

Copilot uses AI. Check for mistakes.
@Bafbi Bafbi moved this from Todo to In Progress in Summer Media Nov 29, 2025
@Bafbi Bafbi self-assigned this Nov 29, 2025
@Bafbi Bafbi removed this from Summer Media Nov 29, 2025
CyprienKelma and others added 13 commits December 13, 2025 17:22
* feat: Setup Prefect and IaC config to support prod environment

* hotfix: change Prefect deployment shell script
* chore: update asset_base path in dbt configuration and adjust file copy commands in GitHub Actions workflow

* chore: update asset paths in dbt documentation and GitHub Actions workflow for consistency

* chore: standardize asset paths in dbt documentation workflow and update configuration for clarity

* chore: correct asset path in fct__viewings

* chore: minor changes

* chore: update asset_base path in erd_config.yml for consistency

* chore: update IAM role for storage admin and adjust seed script parameters for data generation

* feat: enhance data ingestion process by adding ingestion_date to models and updating GCS paths for Hive partitioning

* fix: correct release date calculation in seed_script to use a 30-day interval for seasons

* feat: implement snapshot strategy in models and update external table definitions for ingestion_date partitioning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve reproducibility and tempalating of the repo using mise

2 participants