Slow building of GTDB database

Hi,

Thanks for this tool, this is something I have looked forward to.

But, I have some questions. I have downloaded the pre-built GTDB database, and it works fine. However, since it is release 2.14 I want the newest (r2.26). I started building it, but it takes a loooong time! 

After the initial checking, metabuli starts to fill a temporary folder with unzipped fasta files, the genomes, where the file names are the accession number. This processes around 6-7 genomes per minute. With 143 000 genomes this will take weeks! I have used 10 threads.

What exactly is done during this processing? It looks to me like it only copies the genome files and decompress them? The log-file does not say anything. Is there any way I can speed this up? There is a mentioning of CDS-files. Does this have to be files downloaded from NCBI (refseg/genbank)? Or could I just run prodigal myself (with many array jobs) to do this job in no time?

LS



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow building of GTDB database #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow building of GTDB database #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions