Skip to content

Slow building of GTDB database #152

@larssnip

Description

@larssnip

Hi,

Thanks for this tool, this is something I have looked forward to.

But, I have some questions. I have downloaded the pre-built GTDB database, and it works fine. However, since it is release 2.14 I want the newest (r2.26). I started building it, but it takes a loooong time!

After the initial checking, metabuli starts to fill a temporary folder with unzipped fasta files, the genomes, where the file names are the accession number. This processes around 6-7 genomes per minute. With 143 000 genomes this will take weeks! I have used 10 threads.

What exactly is done during this processing? It looks to me like it only copies the genome files and decompress them? The log-file does not say anything. Is there any way I can speed this up? There is a mentioning of CDS-files. Does this have to be files downloaded from NCBI (refseg/genbank)? Or could I just run prodigal myself (with many array jobs) to do this job in no time?

LS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions