Skip to content

Mabs 2.29

Choose a tag to compare

@shelkmike shelkmike released this 28 May 05:49
· 4 commits to master since this release
1d6d6ca

Main changes:

  1. Fixed a problem that caused Mabs-hifiasm to crash on computers where the command "sort" works in a non-standard way. This is the same problem as described in arq5x/bedtools2#323 .
  2. Now, Mabs uses BUSCO datasets from OrthoDB 10 current as of 2024 instead of BUSCO datasets from OrthoDB 10 current as of 2020.
  3. Now, Mabs-hifiasm is based on Hifiasm 0.25.0 instead of Hifiasm 0.19.8.
  4. Now, Mabs-flye is based on Flye 2.9.5 instead of Flye 2.9.3.
  5. Recently, the capability of making assemblies from raw (which means, without error correction by tools such as Dorado Correct or HERRO) high-accuracy Oxford Nanopore reads made on 10.4.1 flow cells was added to Hifiasm. To take this into account, the option "--pacbio_hifi_reads" of Mabs-hifiasm was renamed to "--long_reads".
    By default, if the median accuracy of reads in FASTQ (calculated from Phred scores) is below 99.8%, Mabs-hifiasm assembles the genome as if reads are raw high-accuracy Oxford Nanopore reads. A user can change this threshold by using the option "--hifi_accuracy_threshold". Also, an option "--should_ont_be_used" has been added that allows the user to force Mabs-hifiasm to use the provided reads as raw high-accuracy Oxford Nanopore reads ("--should_ont_be_used true") or more accurate reads ("--should_ont_be_used false"). By "more accurate reads" here I mean either PacBio HiFi reads, or Oxford Nanopore reads corrected by tools such as Dorado Correct or HERRO.
    For consistency with older versions of Mabs, the option "--pacbio_hifi_reads" has been kept, and is now synonymous with the new option "--long_reads".
    [The above paragraph may sound complex, but Mabs-hifiasm works great with default parameters in most cases]
    When assembling a genome from raw Oxford Nanopore reads, take into account that:
    a) Hifiasm (and, consequently, Mabs-hifiasm) makes bad assemblies when the raw Nanopore reads have low accuracy (median accuracy below 97%). For such reads, I recommend using NextDenovo or Mabs-flye.
    b) Hifiasm (and, consequently, Mabs-hifiasm) cannot use reads in FASTA format as raw Oxford Nanopore reads. In other words, please give raw Oxford Nanopore reads in FASTQ to Mabs-hifiasm.
  6. Made several improvements to the script plot_gene_coverage_distribution.py, which is located in the folder ./Additional. This script builds sinaplots with gene coverage.
    a) Now, this script has a user-friendly interface. Run "python3 plot_gene_coverage_distribution.py --help" to see the list of options. A user can customize the appearance of the produced figure. For example, change the size of points and change the maximum value on the vertical axis.
    b) Now, this script can optionally make a per-orthogroup figure alongside a per-gene figure. In the per-gene figure (which is produced by default), every point is a gene, while in the per-orthogroup figure every point is an orthogroup. For orthogroups with several genes ("multicopy orthogroups") the coverage of orthogrous is calculated as the arithmetic mean of coverages of its genes.