Skip to content

feat(clickhouse-connect): make label mutation optional via database extra#34091

Open
joe-clickhouse wants to merge 12 commits intoapache:masterfrom
joe-clickhouse:joe/no-label-mutation
Open

feat(clickhouse-connect): make label mutation optional via database extra#34091
joe-clickhouse wants to merge 12 commits intoapache:masterfrom
joe-clickhouse:joe/no-label-mutation

Conversation

@joe-clickhouse
Copy link

@joe-clickhouse joe-clickhouse commented Jul 7, 2025

SUMMARY

  • Introduces an "extra" field boolean option called "mutate_label_name" to allow ClickHouse users the ability to opt out of default label mutation behavior. This is helpful when building complex queries that need to reference un-mutated aliases.
  • Works by modifying make_label_compatible to accept an additional database parameter. Then uses inspect.signature to check if the subclass's _mutate_label method accepts database as a parameter and dispatches accordingly.
  • Documentation updated with usage example.
  • Unit tests added for both default (mutation enabled) and disabled cases.

TESTING INSTRUCTIONS

  1. Start superset and connect to a clickhouse server via clickhouse-connect.
  2. Create a simple chart, view the query and notice columns are aliased with mutated labels e.g. ... AS "My column_b77020",
  3. Drop into flask shell, get a handle to the database of interest, update its extra data to include mutate_label_name and set it to False. e.g. extra = json.loads(database.extra); extra['mutate_label_name'] = False; database.extra = json.dumps(extra); db.session.commit()
  4. Rerun the query in the UI and notice the label is no longer muted: e.g. ... AS "My column",

ADDITIONAL INFORMATION

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Functionality Default label mutation may break existing queries ▹ view 🧠 Not in scope
Files scanned
File Path Reviewed
superset/db_engine_specs/clickhouse.py
superset/config.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

@codecov
Copy link

codecov bot commented Jul 7, 2025

Codecov Report

❌ Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.37%. Comparing base (c244e7f) to head (0667c9d).
⚠️ Report is 37 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/clickhouse.py 9.09% 10 Missing ⚠️
superset/db_engine_specs/base.py 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #34091       +/-   ##
===========================================
+ Coverage        0   68.37%   +68.37%     
===========================================
  Files           0      631      +631     
  Lines           0    45977    +45977     
  Branches        0     4984     +4984     
===========================================
+ Hits            0    31439    +31439     
- Misses          0    13291    +13291     
- Partials        0     1247     +1247     
Flag Coverage Δ
hive 43.96% <11.11%> (?)
mysql 67.46% <33.33%> (?)
postgres 67.51% <33.33%> (?)
presto 47.49% <33.33%> (?)
python 68.34% <33.33%> (?)
sqlite 67.12% <33.33%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mistercrunch
Copy link
Member

mistercrunch commented Jul 8, 2025

Why allowing both through config? Shouldn't this be one or the other - whichever works or works best?

Overall I'm not big on adding database-engine-specific configs in superset/config.py. If indeed we need both, maybe this could be done as a database connection flag, maybe something in the connection's extra field (which is used to cover the long tail of database connection attributes)?

@joe-clickhouse
Copy link
Author

@mistercrunch sorry for the delay! It does seem useful to have both because in some of simpler UI elements, the user can create queries that do require need the default mutation. However, some more advanced cases effectively allow you to create aliases yourself and the mutation gets in the way of referencing that alias later in the same query, which ClickHouse is able to deal with. i like the idea of the extra field and will run with that. Thanks!

@pull-request-size pull-request-size bot added size/L and removed size/S labels Oct 6, 2025
@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Oct 6, 2025
@joe-clickhouse joe-clickhouse changed the title feat(clickhouse-connect): make label mutation optional feat(clickhouse-connect): make label mutation optional via database extra Oct 6, 2025
@sadpandajoe sadpandajoe requested a review from Copilot November 8, 2025 00:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds configurable label mutation for ClickHouse databases, allowing users to disable the automatic label mutation behavior that suffixes column labels with MD5 hashes to avoid collisions.

Key changes:

  • Added mutate_label_name configuration option in database extra JSON for ClickHouse Connect engine
  • Modified _mutate_label method to conditionally apply label mutation based on database configuration
  • Updated multiple call sites to store the database instance in Flask's g object for configuration access

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
superset/db_engine_specs/clickhouse.py Enhanced _mutate_label method to check database extra configuration and conditionally apply label mutation
superset/models/helpers.py Added try/finally blocks to temporarily store database in g object before calling make_label_compatible
superset/models/core.py Added try/finally block to temporarily store database in g object before calling make_label_compatible
tests/unit_tests/db_engine_specs/test_clickhouse.py Added comprehensive test coverage for the new configurable label mutation behavior
docs/docs/configuration/databases.mdx Added documentation explaining the new mutate_label_name configuration option
Comments suppressed due to low confidence (1)

superset/db_engine_specs/clickhouse.py:442

  • Unnecessary 'pass' statement.
                pass

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The g.database approach is very brittle.

Let's change this logic to pass the Database instance as an argument to make_label_compatible and _mutate_label instead — it's called in the Database class and in ExploreMixin, so we should always have the database reference available.

Since this would require updating all DB engine specs, we can use the inspect module in make_label_compatible to dispatch with or without the database argument, depending on the signature of _mutate_label. I think we have 1 or 2 places in the codebase where we do that.

@joe-clickhouse
Copy link
Author

@betodealmeida thanks for the review and suggestion! I re-implemented accordingly.

Comment on lines 1915 to 1918
if "database" in signature(cls._mutate_label).parameters:
label_mutated = cls._mutate_label(label, database=database)
else:
label_mutated = cls._mutate_label(label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@rusackas
Copy link
Member

Running CI checks 🤞

@rusackas
Copy link
Member

Looks like we need a rebase here... noting that the db engine spec now contains a bunch of metadata to automatically build the documentation page, so that needn't be updated (it's generated nowadays)

@pull-request-size pull-request-size bot added size/M and removed size/L labels Feb 17, 2026
@github-actions github-actions bot removed the doc Namespace | Anything related to documentation label Feb 17, 2026
@joe-clickhouse
Copy link
Author

@rusackas conflicts merged and ready for checks!

Comment on lines 236 to 239
(None, "time", "time_07cc69"),
({}, "time", "time_07cc69"),
# Explicitly enable
({"mutate_label_name": True}, "time", "time_07cc69"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The new parametrized test for label mutation with extra expects a different default mutated label (time_07cc69) than the existing test_connect_make_label_compatible (time_336074), even though both exercise the same default behavior for make_label_compatible, which will either cause this test to fail or incorrectly document the function's output. The expected mutated labels in the new test should be aligned with the existing test so that default behavior (with or without a database argument when mutate_label_name is enabled or absent) is consistent. [logic error]

Severity Level: Major ⚠️
- ❌ ClickHouseConnect label mutation tests cannot both pass simultaneously.
- ⚠️ Confusing contract for default label mutation with extra.
- ⚠️ CI reliability reduced by contradictory test expectations.
Suggested change
(None, "time", "time_07cc69"),
({}, "time", "time_07cc69"),
# Explicitly enable
({"mutate_label_name": True}, "time", "time_07cc69"),
(None, "time", "time_336074"),
({}, "time", "time_336074"),
# Explicitly enable
({"mutate_label_name": True}, "time", "time_336074"),
Steps of Reproduction ✅
1. Run the unit tests for ClickHouse connect engine specs: `pytest
tests/unit_tests/db_engine_specs/test_clickhouse.py`.

2. Observe `test_connect_make_label_compatible` (around line 223) which imports
`ClickHouseConnectEngineSpec` and asserts that `spec.make_label_compatible("time")`
returns `"time_336074"` as the default mutated label.

3. Observe `test_connect_make_label_compatible_with_extra` (around line 245) in the same
file, which also calls `ClickHouseConnectEngineSpec.make_label_compatible("time",
database=mock_database)` with `extra` set to `None`, `{}`, or `{"mutate_label_name":
True}`, but asserts the default mutated label is `"time_07cc69"` (lines 235–239).

4. Because both tests exercise the same function and both treat these cases as the
"default behavior (mutate labels)", there is no possible implementation where both
expectations (`"time_336074"` and `"time_07cc69"` for the same default setting) can be
satisfied simultaneously; running the test file will therefore cause at least one of these
tests to fail, or it will encode an inconsistent contract for default label mutation if
the implementation is changed to satisfy one side only.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/db_engine_specs/test_clickhouse.py
**Line:** 236:239
**Comment:**
	*Logic Error: The new parametrized test for label mutation with `extra` expects a different default mutated label (`time_07cc69`) than the existing `test_connect_make_label_compatible` (`time_336074`), even though both exercise the same default behavior for `make_label_compatible`, which will either cause this test to fail or incorrectly document the function's output. The expected mutated labels in the new test should be aligned with the existing test so that default behavior (with or without a `database` argument when `mutate_label_name` is enabled or absent) is consistent.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Label mutation hash default switched from md5 to sha256. Will fix.

Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #7382ba

Actionable Suggestions - 2
  • superset/db_engine_specs/clickhouse.py - 1
    • Overly broad exception handling in try-except · Line 544-551
  • tests/unit_tests/db_engine_specs/test_clickhouse.py - 1
Review Details
  • Files reviewed - 5 · Commit Range: e2e6816..d672669
    • superset/db_engine_specs/base.py
    • superset/db_engine_specs/clickhouse.py
    • superset/models/core.py
    • superset/models/helpers.py
    • tests/unit_tests/db_engine_specs/test_clickhouse.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment on lines +544 to +551
if database:
try:
extra = database.get_extra()
mutate_label = extra.get("mutate_label_name", True)
except Exception: # pylint: disable=broad-except
logger.warning(
"Error retrieving mutate_label_name setting. Falling back to "
"default behavior of True."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad exception handling in try-except

Replace the broad Exception catch with a more specific exception type. Consider catching (KeyError, ValueError, TypeError) or other specific exceptions that database.get_extra() might raise.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if database:
try:
extra = database.get_extra()
mutate_label = extra.get("mutate_label_name", True)
except Exception: # pylint: disable=broad-except
logger.warning(
"Error retrieving mutate_label_name setting. Falling back to "
"default behavior of True."
if database:
try:
extra = database.get_extra()
mutate_label = extra.get("mutate_label_name", True)
except (KeyError, ValueError, TypeError, AttributeError):
logger.warning(
"Error retrieving mutate_label_name setting. Falling back to "
"default behavior of True."

Code Review Run #7382ba


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the broad exception here is ok. get_extra() calls get_extra_params() which could raise json.JSONDecodeError and isn't even the suggested list. Other error types could be potentially raised along the way as well. This is a simple label formatting helper limited to the clickhouse spec so if anything goes wrong, no matter what it is, it'd be preferable to fall back to default rather than crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad exception handling in the suggestion is appropriate for this label formatting helper, as it prioritizes graceful fallback over potential crashes from various errors during settings retrieval.

superset/db_engine_specs/clickhouse.py

if database:
            try:
                extra = database.get_extra()
                mutate_label = extra.get("mutate_label_name", True)
            except (KeyError, ValueError, TypeError, AttributeError):
                logger.warning(
                    "Error retrieving mutate_label_name setting. Falling back to "
                    "default behavior of True."

Comment on lines 235 to 243
# Default behavior (mutate labels)
(None, "time", "time_07cc69"),
({}, "time", "time_07cc69"),
# Explicitly enable
({"mutate_label_name": True}, "time", "time_07cc69"),
# Disabled (no mutation)
({"mutate_label_name": False}, "time", "time"),
({"mutate_label_name": False}, "count", "count"),
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect test expectations

The test expects MD5-based hash suffixes ('time_07cc69'), but the code now uses SHA-256 by default, producing 'time_336074'. Update all expected results to match SHA-256 hashes to prevent test failures.

Code suggestion
Check the AI-generated fix before applying
Suggested change
# Default behavior (mutate labels)
(None, "time", "time_07cc69"),
({}, "time", "time_07cc69"),
# Explicitly enable
({"mutate_label_name": True}, "time", "time_07cc69"),
# Disabled (no mutation)
({"mutate_label_name": False}, "time", "time"),
({"mutate_label_name": False}, "count", "count"),
],
# Default behavior (mutate labels)
(None, "time", "time_336074"),
({}, "time", "time_336074"),
# Explicitly enable
({"mutate_label_name": True}, "time", "time_336074"),
# Disabled (no mutation)
({"mutate_label_name": False}, "time", "time"),
({"mutate_label_name": False}, "count", "count"),
({"mutate_label_name": True}, "count", "count_6c3549"),
],

Code Review Run #7382ba


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Captured above by codeant, will fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question isn’t related to the pull request. I can only help with questions about the PR’s code or comments.

@bito-code-review
Copy link
Contributor

bito-code-review bot commented Feb 17, 2026

Code Review Agent Run #f663d7

Actionable Suggestions - 0
Review Details
  • Files reviewed - 5 · Commit Range: d672669..c308f30
    • superset/db_engine_specs/base.py
    • superset/db_engine_specs/clickhouse.py
    • superset/models/core.py
    • superset/models/helpers.py
    • tests/unit_tests/db_engine_specs/test_clickhouse.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants