-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hi,
Thanks for this tool, this is something I have looked forward to.
But, I have some questions. I have downloaded the pre-built GTDB database, and it works fine. However, since it is release 2.14 I want the newest (r2.26). I started building it, but it takes a loooong time!
After the initial checking, metabuli starts to fill a temporary folder with unzipped fasta files, the genomes, where the file names are the accession number. This processes around 6-7 genomes per minute. With 143 000 genomes this will take weeks! I have used 10 threads.
What exactly is done during this processing? It looks to me like it only copies the genome files and decompress them? The log-file does not say anything. Is there any way I can speed this up? There is a mentioning of CDS-files. Does this have to be files downloaded from NCBI (refseg/genbank)? Or could I just run prodigal myself (with many array jobs) to do this job in no time?
LS