feat(clickhouse-connect): make label mutation optional via database extra#34091
feat(clickhouse-connect): make label mutation optional via database extra#34091joe-clickhouse wants to merge 12 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Default label mutation may break existing queries ▹ view | 🧠 Not in scope |
Files scanned
| File Path | Reviewed |
|---|---|
| superset/db_engine_specs/clickhouse.py | ✅ |
| superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #34091 +/- ##
===========================================
+ Coverage 0 68.37% +68.37%
===========================================
Files 0 631 +631
Lines 0 45977 +45977
Branches 0 4984 +4984
===========================================
+ Hits 0 31439 +31439
- Misses 0 13291 +13291
- Partials 0 1247 +1247
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Why allowing both through config? Shouldn't this be one or the other - whichever works or works best? Overall I'm not big on adding database-engine-specific configs in |
|
@mistercrunch sorry for the delay! It does seem useful to have both because in some of simpler UI elements, the user can create queries that do require need the default mutation. However, some more advanced cases effectively allow you to create aliases yourself and the mutation gets in the way of referencing that alias later in the same query, which ClickHouse is able to deal with. i like the idea of the |
There was a problem hiding this comment.
Pull Request Overview
This PR adds configurable label mutation for ClickHouse databases, allowing users to disable the automatic label mutation behavior that suffixes column labels with MD5 hashes to avoid collisions.
Key changes:
- Added
mutate_label_nameconfiguration option in database extra JSON for ClickHouse Connect engine - Modified
_mutate_labelmethod to conditionally apply label mutation based on database configuration - Updated multiple call sites to store the database instance in Flask's
gobject for configuration access
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| superset/db_engine_specs/clickhouse.py | Enhanced _mutate_label method to check database extra configuration and conditionally apply label mutation |
| superset/models/helpers.py | Added try/finally blocks to temporarily store database in g object before calling make_label_compatible |
| superset/models/core.py | Added try/finally block to temporarily store database in g object before calling make_label_compatible |
| tests/unit_tests/db_engine_specs/test_clickhouse.py | Added comprehensive test coverage for the new configurable label mutation behavior |
| docs/docs/configuration/databases.mdx | Added documentation explaining the new mutate_label_name configuration option |
Comments suppressed due to low confidence (1)
superset/db_engine_specs/clickhouse.py:442
- Unnecessary 'pass' statement.
pass
betodealmeida
left a comment
There was a problem hiding this comment.
The g.database approach is very brittle.
Let's change this logic to pass the Database instance as an argument to make_label_compatible and _mutate_label instead — it's called in the Database class and in ExploreMixin, so we should always have the database reference available.
Since this would require updating all DB engine specs, we can use the inspect module in make_label_compatible to dispatch with or without the database argument, depending on the signature of _mutate_label. I think we have 1 or 2 places in the codebase where we do that.
|
@betodealmeida thanks for the review and suggestion! I re-implemented accordingly. |
| if "database" in signature(cls._mutate_label).parameters: | ||
| label_mutated = cls._mutate_label(label, database=database) | ||
| else: | ||
| label_mutated = cls._mutate_label(label) |
|
Running CI checks 🤞 |
|
Looks like we need a rebase here... noting that the db engine spec now contains a bunch of metadata to automatically build the documentation page, so that needn't be updated (it's generated nowadays) |
|
@rusackas conflicts merged and ready for checks! |
| (None, "time", "time_07cc69"), | ||
| ({}, "time", "time_07cc69"), | ||
| # Explicitly enable | ||
| ({"mutate_label_name": True}, "time", "time_07cc69"), |
There was a problem hiding this comment.
Suggestion: The new parametrized test for label mutation with extra expects a different default mutated label (time_07cc69) than the existing test_connect_make_label_compatible (time_336074), even though both exercise the same default behavior for make_label_compatible, which will either cause this test to fail or incorrectly document the function's output. The expected mutated labels in the new test should be aligned with the existing test so that default behavior (with or without a database argument when mutate_label_name is enabled or absent) is consistent. [logic error]
Severity Level: Major ⚠️
- ❌ ClickHouseConnect label mutation tests cannot both pass simultaneously.
- ⚠️ Confusing contract for default label mutation with extra.
- ⚠️ CI reliability reduced by contradictory test expectations.| (None, "time", "time_07cc69"), | |
| ({}, "time", "time_07cc69"), | |
| # Explicitly enable | |
| ({"mutate_label_name": True}, "time", "time_07cc69"), | |
| (None, "time", "time_336074"), | |
| ({}, "time", "time_336074"), | |
| # Explicitly enable | |
| ({"mutate_label_name": True}, "time", "time_336074"), |
Steps of Reproduction ✅
1. Run the unit tests for ClickHouse connect engine specs: `pytest
tests/unit_tests/db_engine_specs/test_clickhouse.py`.
2. Observe `test_connect_make_label_compatible` (around line 223) which imports
`ClickHouseConnectEngineSpec` and asserts that `spec.make_label_compatible("time")`
returns `"time_336074"` as the default mutated label.
3. Observe `test_connect_make_label_compatible_with_extra` (around line 245) in the same
file, which also calls `ClickHouseConnectEngineSpec.make_label_compatible("time",
database=mock_database)` with `extra` set to `None`, `{}`, or `{"mutate_label_name":
True}`, but asserts the default mutated label is `"time_07cc69"` (lines 235–239).
4. Because both tests exercise the same function and both treat these cases as the
"default behavior (mutate labels)", there is no possible implementation where both
expectations (`"time_336074"` and `"time_07cc69"` for the same default setting) can be
satisfied simultaneously; running the test file will therefore cause at least one of these
tests to fail, or it will encode an inconsistent contract for default label mutation if
the implementation is changed to satisfy one side only.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/db_engine_specs/test_clickhouse.py
**Line:** 236:239
**Comment:**
*Logic Error: The new parametrized test for label mutation with `extra` expects a different default mutated label (`time_07cc69`) than the existing `test_connect_make_label_compatible` (`time_336074`), even though both exercise the same default behavior for `make_label_compatible`, which will either cause this test to fail or incorrectly document the function's output. The expected mutated labels in the new test should be aligned with the existing test so that default behavior (with or without a `database` argument when `mutate_label_name` is enabled or absent) is consistent.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Good catch. Label mutation hash default switched from md5 to sha256. Will fix.
There was a problem hiding this comment.
Code Review Agent Run #7382ba
Actionable Suggestions - 2
-
superset/db_engine_specs/clickhouse.py - 1
- Overly broad exception handling in try-except · Line 544-551
-
tests/unit_tests/db_engine_specs/test_clickhouse.py - 1
- Incorrect test expectations · Line 235-243
Review Details
-
Files reviewed - 5 · Commit Range:
e2e6816..d672669- superset/db_engine_specs/base.py
- superset/db_engine_specs/clickhouse.py
- superset/models/core.py
- superset/models/helpers.py
- tests/unit_tests/db_engine_specs/test_clickhouse.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| if database: | ||
| try: | ||
| extra = database.get_extra() | ||
| mutate_label = extra.get("mutate_label_name", True) | ||
| except Exception: # pylint: disable=broad-except | ||
| logger.warning( | ||
| "Error retrieving mutate_label_name setting. Falling back to " | ||
| "default behavior of True." |
There was a problem hiding this comment.
Replace the broad Exception catch with a more specific exception type. Consider catching (KeyError, ValueError, TypeError) or other specific exceptions that database.get_extra() might raise.
Code suggestion
Check the AI-generated fix before applying
| if database: | |
| try: | |
| extra = database.get_extra() | |
| mutate_label = extra.get("mutate_label_name", True) | |
| except Exception: # pylint: disable=broad-except | |
| logger.warning( | |
| "Error retrieving mutate_label_name setting. Falling back to " | |
| "default behavior of True." | |
| if database: | |
| try: | |
| extra = database.get_extra() | |
| mutate_label = extra.get("mutate_label_name", True) | |
| except (KeyError, ValueError, TypeError, AttributeError): | |
| logger.warning( | |
| "Error retrieving mutate_label_name setting. Falling back to " | |
| "default behavior of True." |
Code Review Run #7382ba
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
I think the broad exception here is ok. get_extra() calls get_extra_params() which could raise json.JSONDecodeError and isn't even the suggested list. Other error types could be potentially raised along the way as well. This is a simple label formatting helper limited to the clickhouse spec so if anything goes wrong, no matter what it is, it'd be preferable to fall back to default rather than crash.
There was a problem hiding this comment.
The broad exception handling in the suggestion is appropriate for this label formatting helper, as it prioritizes graceful fallback over potential crashes from various errors during settings retrieval.
superset/db_engine_specs/clickhouse.py
if database:
try:
extra = database.get_extra()
mutate_label = extra.get("mutate_label_name", True)
except (KeyError, ValueError, TypeError, AttributeError):
logger.warning(
"Error retrieving mutate_label_name setting. Falling back to "
"default behavior of True."
| # Default behavior (mutate labels) | ||
| (None, "time", "time_07cc69"), | ||
| ({}, "time", "time_07cc69"), | ||
| # Explicitly enable | ||
| ({"mutate_label_name": True}, "time", "time_07cc69"), | ||
| # Disabled (no mutation) | ||
| ({"mutate_label_name": False}, "time", "time"), | ||
| ({"mutate_label_name": False}, "count", "count"), | ||
| ], |
There was a problem hiding this comment.
The test expects MD5-based hash suffixes ('time_07cc69'), but the code now uses SHA-256 by default, producing 'time_336074'. Update all expected results to match SHA-256 hashes to prevent test failures.
Code suggestion
Check the AI-generated fix before applying
| # Default behavior (mutate labels) | |
| (None, "time", "time_07cc69"), | |
| ({}, "time", "time_07cc69"), | |
| # Explicitly enable | |
| ({"mutate_label_name": True}, "time", "time_07cc69"), | |
| # Disabled (no mutation) | |
| ({"mutate_label_name": False}, "time", "time"), | |
| ({"mutate_label_name": False}, "count", "count"), | |
| ], | |
| # Default behavior (mutate labels) | |
| (None, "time", "time_336074"), | |
| ({}, "time", "time_336074"), | |
| # Explicitly enable | |
| ({"mutate_label_name": True}, "time", "time_336074"), | |
| # Disabled (no mutation) | |
| ({"mutate_label_name": False}, "time", "time"), | |
| ({"mutate_label_name": False}, "count", "count"), | |
| ({"mutate_label_name": True}, "count", "count_6c3549"), | |
| ], |
Code Review Run #7382ba
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Captured above by codeant, will fix.
There was a problem hiding this comment.
This question isn’t related to the pull request. I can only help with questions about the PR’s code or comments.
Code Review Agent Run #f663d7Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
"mutate_label_name"to allow ClickHouse users the ability to opt out of default label mutation behavior. This is helpful when building complex queries that need to reference un-mutated aliases.make_label_compatibleto accept an additionaldatabaseparameter. Then usesinspect.signatureto check if the subclass's_mutate_labelmethod acceptsdatabaseas a parameter and dispatches accordingly.TESTING INSTRUCTIONS
... AS "My column_b77020",extradata to includemutate_label_nameand set it toFalse. e.g.extra = json.loads(database.extra); extra['mutate_label_name'] = False; database.extra = json.dumps(extra); db.session.commit()... AS "My column",ADDITIONAL INFORMATION