Skip to content

MartinHofpower/GenerateDBT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GenerateDBT

A Python tool for generating dbt (data build tool) projects with configurable scale and complexity for testing purposes.

Disclaimer: This is a community-driven project initially set up with the help of GitHub Copilot and is not officially affiliated with dbt Labs. The tool is designed to help users create dbt projects for testing and learning purposes. The project is tested mainly with dbt-fabric but should work with other dbt adapters as well.

Overview

GenerateDBT creates complete dbt projects with models, macros, and seed data that can be used to test dbt functionality across different data platforms (Snowflake, BigQuery, Postgres, Microsoft Fabric, Databricks, etc.). The generated code is platform-agnostic and follows dbt best practices.

Features

  • 🎯 Configurable Scale: Choose the number of models and macros to generate
  • πŸ“Š Complexity Levels: Simple, medium, or complex model patterns
  • πŸ—οΈ Model Layers: Staging, intermediate, and mart models following dbt conventions
  • πŸ”§ Utility Macros: String, date, aggregation, and data quality macros
  • πŸ“ Documentation: Auto-generated schema.yml and README files
  • 🌱 Seed Data: Sample CSV files for testing
  • πŸš€ Platform-Agnostic: Works with any dbt-supported database

Installation

Clone Repo

git clone https://github.com/MartinHofpower/GenerateDBT.git

From Source

cd GenerateDBT
pip install -e .

Quick Start

Generate a default dbt project:

generate-dbt

This creates a project with:

  • 10 models (staging, intermediate, and marts)
  • 5 macros
  • 3 seed data files
  • Medium complexity
  • Output to ./generated_dbt_project

Usage

Basic Examples

Generate a simple project:

generate-dbt --complexity simple

Generate a complex project with more models:

generate-dbt --num-models 20 --num-macros 10 --complexity complex

Generate to a specific directory:

generate-dbt --output-dir ./my_test_project --project-name my_dbt_test

Advanced Options

generate-dbt \
  --num-models 15 \
  --num-macros 8 \
  --num-seeds 5 \
  --complexity complex \
  --output-dir ./test_project \
  --project-name fabric_test \
  --max-dependencies 4 \
  --no-intermediate

Command-Line Options

Option Default Description
--num-models 10 Number of models to generate
--num-macros 5 Number of macros to generate
--complexity medium Complexity level (simple, medium, complex)
--output-dir ./generated_dbt_project Output directory
--project-name test_dbt_project Name of the dbt project
--num-seeds 3 Number of seed data files
--max-dependencies 3 Max dependencies per model
--no-staging False Skip staging models
--no-intermediate False Skip intermediate models
--no-marts False Skip mart models

Generated Project Structure

generated_dbt_project/
β”œβ”€β”€ dbt_project.yml          # Project configuration
β”œβ”€β”€ README.md                 # Generated project documentation
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ schema.yml           # Model documentation and tests
β”‚   β”œβ”€β”€ staging/             # Staging models (light transformations)
β”‚   β”‚   └── stg_*.sql
β”‚   β”œβ”€β”€ intermediate/        # Intermediate transformations
β”‚   β”‚   └── int_*.sql
β”‚   └── marts/               # Business-level models
β”‚       └── (fct_*.sql, dim_*.sql)
β”œβ”€β”€ macros/                  # Reusable SQL macros
β”‚   β”œβ”€β”€ string_utils.sql
β”‚   β”œβ”€β”€ date_utils.sql
β”‚   └── ...
β”œβ”€β”€ seeds/                   # Sample CSV data
β”‚   β”œβ”€β”€ raw_data_1.csv
β”‚   └── ...
└── tests/                   # Custom test directory

Complexity Levels

Simple

  • Basic SELECT statements
  • Minimal transformations
  • Simple macros with single operations

Medium

  • CTEs (Common Table Expressions)
  • Basic joins and aggregations
  • Macros with conditional logic
  • Data quality checks

Complex

  • Multiple CTEs and complex joins
  • Window functions
  • Incremental materializations
  • Advanced macros with loops
  • Comprehensive data quality frameworks

Using the Generated Project

  1. Navigate to the project:

    cd generated_dbt_project
  2. Install dbt and adapter:

    pip install dbt-core dbt-<your-adapter>

    Replace <your-adapter> with your platform (e.g., dbt-snowflake, dbt-bigquery, dbt-fabric, dbt-postgres)

  3. Configure profiles.yml: Create or update ~/.dbt/profiles.yml with your database credentials:

    test_dbt_project:
      outputs:
        dev:
          type: <adapter_type>
          # Add your connection details
      target: dev
  4. Test connection:

    dbt debug
  5. Load seed data:

    dbt seed
  6. Run models:

    dbt run
  7. Run tests:

    dbt test
  8. Generate and view docs:

    dbt docs generate
    dbt docs serve

Testing on Different Platforms

This tool generates platform-agnostic dbt code that works with any supported adapter:

  • Snowflake: pip install dbt-snowflake
  • BigQuery: pip install dbt-bigquery
  • Postgres: pip install dbt-postgres
  • Redshift: pip install dbt-redshift
  • Microsoft Fabric: pip install dbt-fabric
  • Databricks: pip install dbt-databricks

Simply install the appropriate adapter and configure your profiles.yml accordingly.

Use Cases

  • πŸ§ͺ Testing dbt on new platforms: Quickly generate test projects for Microsoft Fabric, Databricks, or other platforms
  • πŸ“š Learning dbt: Study example projects with various patterns
  • πŸŽ“ Training: Create sample projects for teaching dbt concepts
  • πŸ”¬ Performance testing: Generate large projects to test performance
  • πŸ› Debugging: Create reproducible test cases for dbt issues

Development

Requirements

  • Python 3.7+
  • PyYAML
  • Click

Setup Development Environment

git clone https://github.com/MartinHofpower/GenerateDBT.git
cd GenerateDBT
pip install -e .

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built for testing dbt projects across various data platforms
  • Follows dbt best practices and conventions
  • Inspired by the need for flexible, scalable dbt testing scenarios

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages