Componentize DBT cloud integration#33361
Componentize DBT cloud integration#33361michalcabir-ui wants to merge 14 commits intodagster-io:masterfrom
Conversation
|
During the refactoring process to introduce BaseDbtComponent and DbtCloudComponent, I noticed that DbtProjectComponent became increasingly verbose. This complexity primarily stems from strictly adhering to Backward Compatibility with the existing test suite. The legacy tests pass loosely-typed data (e.g., raw strings for deps and keys) instead of the expected AssetKey or AssetDep objects defined in the new strict type hints. To ensure all existing tests pass without modification (Green Build), I implemented runtime adapter logic within get_asset_spec to normalize these inputs. I think that the cleaner approach would be to refactor the legacy tests to strictly adhere to the component's type signature. This would allow to remove the defensive type-checking logic from the production code, significantly simplifying DbtProjectComponent and making it more readable and Pythonic. |
OwenKephart
left a comment
There was a problem hiding this comment.
Left comments in line but there seem to be a huge number of changes to dbt_project/component.py, which should essentially all be reverted -- we're just factoring out a shared base class here so the main net result should be removing duplicated properties / functions, not changing how the existing abstractions work
python_modules/libraries/dagster-dbt/dagster_dbt/cloud_v2/component/dbt_cloud_component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/cloud_v2/component/dbt_cloud_component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/cloud_v2/component/dbt_cloud_component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/cloud_v2/component/dbt_cloud_component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/cloud_v2/component/dbt_cloud_component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/components/base.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/components/dbt_project/component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/components/dbt_project/component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/components/dbt_project/component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/components/dbt_project/component.py
Outdated
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/components/dbt_project/component.py
Show resolved
Hide resolved
python_modules/libraries/dagster-dbt/dagster_dbt/cloud_v2/component/dbt_cloud_component.py
Show resolved
Hide resolved
| @@ -19,6 +19,8 @@ dg list components | |||
| │ dagster.UvRunComponent │ Represents a Python script, alongside the set of assets or asset │ | |||
| │ │ checks that it is responsible for executing. │ | |||
| ├───────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────┤ | |||
There was a problem hiding this comment.
@OwenKephart I attempted to manually (I use Windows and not Linux) update the this in order to match the expected CLI output, but the CI continues to report mismatches
Still fails, it will be better to solve those with tox environment and run the automated snippet update.
Summary & Motivation
This PR introduces the DbtCloudComponent, enabling users to load dbt Cloud projects as Dagster assets using the new Components API. This aligns the dbt Cloud experience with the existing local DbtProjectComponent.
Key changes:
New Component: Implemented DbtCloudComponent in dagster_dbt/cloud_v2/component/. It handles manifest synchronization, state management, and remote job execution via the CLI.
Refactoring: Created a shared BaseDbtComponent (in dagster_dbt/components/base.py) to consolidate common logic between local and cloud components (e.g., get_asset_spec, translator initialization).
Refactored DbtProjectComponent to inherit from BaseDbtComponent. Moved shared fields (e.g., op, select, exclude, translation_settings) to the base class to avoid duplication.
CLI Templates: Updated dagster-dg-cli test snapshots (expected_schema, expected_example) to reflect the new field ordering resulting from the inheritance change.
Cloud Compatibility: Explicitly disabled enable_code_references in the translator for dbt Cloud assets, as local source file linking is not applicable for remote projects.
How I Tested These Changes
Run the current test file- ython_modules/libraries/dagsterdbt/dagster_dbt_tests/components/test_dbt_project_component.py to verify I didnt break existing logic.
Added a new test suite in python_modules/libraries/dagster-dbt/dagster_dbt_tests/cloud_v2/test_component.py covering the following scenarios:
Ran and updated python_modules/libraries/dagster-dg-cli/dagster_dg_cli_tests/yaml_template/test_yaml_template_generator.py to verify that the generated YAML templates for DbtProjectComponent are correct after the refactor.
State Management: Verified that the component can serialize workspace data (manifest, jobs) to a local state file and reload it to build definitions (simulating a deployment cycle).
Execution: Verified that the component correctly invokes the underlying CLI with the run command.
Asset Selection: Verified that get_asset_selection correctly builds a DbtManifestAssetSelection object.
Data Handling: Verified robust handling of job data serialization (supporting both objects and dictionaries).
Changelog
DbtCloudComponentto support loading dbt Cloud projects using the Components API.BaseDbtComponentto unify implementation between local and cloud dbt components and remove duplicate field definitions. Updated dagster-dg-cli templates to align with the new component structure.docs/integrations/libraries/dbt/dbt-cloud.md,sphinx/sections/integrations/libaries/dbt/dagster-dbt.rst