Skip to content

[Bug] full-refresh broken for too large data #12467

@jochen-schuettler

Description

@jochen-schuettler

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

On dbt run -t target_name -s model_name --single-threaded --full-refresh after a long time
HIVE_PATH_ALREADY_EXISTS: Target directory for table 'database_name.model_name__tmp_not_partitioned' already exists: s3://bucket_name/database_name/model_name__tmp_not_partitioned

Expected Behavior

model_name is created

Steps To Reproduce

  • AWS Athena / Glue / S3
  • roughly 5_000_000 datasets in 35_000 partitions
  • input table JSON, output table parquet
  • {{ config(materialized="incremental", partitioned_by=["datum", "locale"]) }}
  • staging or marts, both are afflicted the same way
  • dbt run -t target_name -s model_name --single-threaded --full-refresh

Relevant log output

no additional relevant logs with --debug, just
`HIVE_PATH_ALREADY_EXISTS: Target directory for table 'database_name.model_name__tmp_not_partitioned' already exists: s3://bucket_name/database_name/model_name__tmp_not_partitioned`

Environment

- OS: Fedora 43
- Python: 3.12.7
- dbt: 1.11.2
- dbt-athena: 1.10.0

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

  • use dbt-athena 1.10.0
  • tried a lot of things, including moving dbt outputs to a non-versioned bucket
  • possible workaround: use only part of the data in full-refresh, feeding further shards in incremental mode
  • so likely a problem of too large data in some way

Ideas:

  • Maybe the temp table just takes to long to be dropped ?
  • Somewhere along the way something decides to do this multi-threaded and thread 2 finds the tmp table of thread 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions