Skip to content

Componentize DBT cloud integration#33361

Open
michalcabir-ui wants to merge 14 commits intodagster-io:masterfrom
michalcabir-ui:componentizeDBTcloudIntegration
Open

Componentize DBT cloud integration#33361
michalcabir-ui wants to merge 14 commits intodagster-io:masterfrom
michalcabir-ui:componentizeDBTcloudIntegration

Conversation

@michalcabir-ui
Copy link
Contributor

@michalcabir-ui michalcabir-ui commented Jan 24, 2026

Summary & Motivation

This PR introduces the DbtCloudComponent, enabling users to load dbt Cloud projects as Dagster assets using the new Components API. This aligns the dbt Cloud experience with the existing local DbtProjectComponent.

Key changes:

New Component: Implemented DbtCloudComponent in dagster_dbt/cloud_v2/component/. It handles manifest synchronization, state management, and remote job execution via the CLI.

Refactoring: Created a shared BaseDbtComponent (in dagster_dbt/components/base.py) to consolidate common logic between local and cloud components (e.g., get_asset_spec, translator initialization).

Refactored DbtProjectComponent to inherit from BaseDbtComponent. Moved shared fields (e.g., op, select, exclude, translation_settings) to the base class to avoid duplication.

CLI Templates: Updated dagster-dg-cli test snapshots (expected_schema, expected_example) to reflect the new field ordering resulting from the inheritance change.

Cloud Compatibility: Explicitly disabled enable_code_references in the translator for dbt Cloud assets, as local source file linking is not applicable for remote projects.

How I Tested These Changes

Run the current test file- ython_modules/libraries/dagsterdbt/dagster_dbt_tests/components/test_dbt_project_component.py to verify I didnt break existing logic.
Added a new test suite in python_modules/libraries/dagster-dbt/dagster_dbt_tests/cloud_v2/test_component.py covering the following scenarios:
Ran and updated python_modules/libraries/dagster-dg-cli/dagster_dg_cli_tests/yaml_template/test_yaml_template_generator.py to verify that the generated YAML templates for DbtProjectComponent are correct after the refactor.

State Management: Verified that the component can serialize workspace data (manifest, jobs) to a local state file and reload it to build definitions (simulating a deployment cycle).

Execution: Verified that the component correctly invokes the underlying CLI with the run command.

Asset Selection: Verified that get_asset_selection correctly builds a DbtManifestAssetSelection object.

Data Handling: Verified robust handling of job data serialization (supporting both objects and dictionaries).

Changelog

  • Added DbtCloudComponent to support loading dbt Cloud projects using the Components API.
  • Refactored shared logic into BaseDbtComponent to unify implementation between local and cloud dbt components and remove duplicate field definitions. Updated dagster-dg-cli templates to align with the new component structure.
  • test: Added comprehensive unit tests covering state management, execution, and asset selection.
  • docs: Updated the dbt Cloud integration guide in docs/integrations/libraries/dbt/dbt-cloud.md, sphinx/sections/integrations/libaries/dbt/dagster-dbt.rst

@michalcabir-ui
Copy link
Contributor Author

During the refactoring process to introduce BaseDbtComponent and DbtCloudComponent, I noticed that DbtProjectComponent became increasingly verbose. This complexity primarily stems from strictly adhering to Backward Compatibility with the existing test suite.

The legacy tests pass loosely-typed data (e.g., raw strings for deps and keys) instead of the expected AssetKey or AssetDep objects defined in the new strict type hints. To ensure all existing tests pass without modification (Green Build), I implemented runtime adapter logic within get_asset_spec to normalize these inputs.

I think that the cleaner approach would be to refactor the legacy tests to strictly adhere to the component's type signature. This would allow to remove the defensive type-checking logic from the production code, significantly simplifying DbtProjectComponent and making it more readable and Pythonic.
Let me know if you want me to refactor it and i will!

@michalcabir-ui michalcabir-ui marked this pull request as ready for review January 26, 2026 00:36
@michalcabir-ui michalcabir-ui requested a review from a team as a code owner January 26, 2026 00:36
@xionon xionon requested a review from OwenKephart January 28, 2026 19:44
Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments in line but there seem to be a huge number of changes to dbt_project/component.py, which should essentially all be reverted -- we're just factoring out a shared base class here so the main net result should be removing duplicated properties / functions, not changing how the existing abstractions work

@@ -19,6 +19,8 @@ dg list components
│ dagster.UvRunComponent │ Represents a Python script, alongside the set of assets or asset │
│ │ checks that it is responsible for executing. │
├───────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OwenKephart I attempted to manually (I use Windows and not Linux) update the this in order to match the expected CLI output, but the CI continues to report mismatches

Still fails, it will be better to solve those with tox environment and run the automated snippet update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants