-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
Description
Is this a new bug in dbt-core?
- I believe this is a new bug in dbt-core
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
On dbt run -t target_name -s model_name --single-threaded --full-refresh after a long time
HIVE_PATH_ALREADY_EXISTS: Target directory for table 'database_name.model_name__tmp_not_partitioned' already exists: s3://bucket_name/database_name/model_name__tmp_not_partitioned
Expected Behavior
model_name is created
Steps To Reproduce
- AWS Athena / Glue / S3
- roughly 5_000_000 datasets in 35_000 partitions
- input table JSON, output table parquet
- {{ config(materialized="incremental", partitioned_by=["datum", "locale"]) }}
- staging or marts, both are afflicted the same way
- dbt run -t target_name -s model_name --single-threaded --full-refresh
Relevant log output
no additional relevant logs with --debug, just
`HIVE_PATH_ALREADY_EXISTS: Target directory for table 'database_name.model_name__tmp_not_partitioned' already exists: s3://bucket_name/database_name/model_name__tmp_not_partitioned`Environment
- OS: Fedora 43
- Python: 3.12.7
- dbt: 1.11.2
- dbt-athena: 1.10.0Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
- use dbt-athena 1.10.0
- tried a lot of things, including moving dbt outputs to a non-versioned bucket
- possible workaround: use only part of the data in full-refresh, feeding further shards in incremental mode
- so likely a problem of too large data in some way
Ideas:
- Maybe the temp table just takes to long to be dropped ?
- Somewhere along the way something decides to do this multi-threaded and thread 2 finds the tmp table of thread 1
Reactions are currently unavailable