Skip to content

Duckdb#75

Merged
tim-band merged 27 commits intomainfrom
duckdb
Feb 5, 2026
Merged

Duckdb#75
tim-band merged 27 commits intomainfrom
duckdb

Conversation

@tim-band
Copy link
Collaborator

@tim-band tim-band commented Dec 10, 2025

DuckDB does not work without this change; it uses the PostgreSQL dialect with minor changes, but it really needs a couple more.
This change adds DuckDB as a SQLAlchemy plugin, and hooks into the SQL compilation process removing the PostgreSQL code that DuckDB does not understand.

dump-data has also been updated to allow the dumping of all non-ignored non-vocabulary tables in one call, and also to dump the data as Parquet.

So with dump-data for the destination and DuckDB's in-memory database for the source it is now possible to do Parquet-to-Parquet data faking without interacting directly with DuckDB at all! See duckdb.rst for details.

@tim-band
Copy link
Collaborator Author

Finally this all works! It's actually fairly easy to fake parquet files now; see the duckdb.rst file for details.

@stefpiatek
Copy link

Ahh nice, I'll pencil in some time next week hopefully to review. Appreciate it Tim

Copy link

@stefpiatek stefpiatek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh this is very fun. Thanks for working on this and getting the translation so it works ❤️

Comment on lines 299 to 300
except TypeError:
pass

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh when are we expecting this to happen, and if so do we want to log it?

RowCounts = Counter[str]


@compiles(CreateColumn, "duckdb")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh this is fun

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty nasty actually. But yes, fun that this hook exists!

if fk_bits[0] not in tables_dict:
return False
return bool(tables_dict[fk_bits[0]].get("ignore", False))
(table, _column) = split_column_full_name(fk)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh I'm not too sure what was happening before but I think this makes sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one of those "did this ever work?" moments...

column_types = {
column: _dtype_to_sql(dtype) for column, dtype in table.dtypes.items()
}
name_pref = name[: name.rfind(".")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get to a point where the name doesn't have a .?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be if the file doesn't have an extension such as .parquet, but you are right that this needs some sort of defense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's fine. That expression works even if no dot is found.

Comment on lines 536 to 537
if last_part in table_names:
table_names.append(f"{last_part}.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting that this has swapped from first to last

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that previously if there was only one part it was called first_part, now it's called last_part because that's how the new split_column_full_name function does it.

@tim-band tim-band merged commit e1e8992 into main Feb 5, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants