Skip to content

Abra2 collapses genomic duplication in assembly, leading to false realignments #66

@fbattke

Description

@fbattke

We have run into some cases that present a similar phenotype: A patient has a homozygous SNV upstream of a genomic duplication. Abra uses in the region to perform contig assembly and seems to collapse the genomic duplication in its assembly. Then it realigns all reads against the new contig. Since the homozygous SNV is part of the contig sequence, the reads have a lower edit distance mapping to the contig than to the reference, provided they don't span the genomic duplication.

One example: (hg19) chr2:170684000-170684500, inside the UBR3 gene.

Our patient has a homozygous SNV at chr2:170,684,224C>T.
A few bp downsteam at chr2:170,684,237 the sequence is GCGGCGGCGG
Further downstream at chr2:170,684,316 the sequence is GCGGCGGCGG as well.

Abra now creates a contig with sequence TTGGGAGCCTAGCTTGGTCCACAGGCGCCCAGGAGAGAGGGGCGGGAGGAAGGCTCTGCAGCCCGAGGGGGCGTGTGTAGGGGCGGGGCTGCGGGCGGAGGAGCGCGGACGCTCCGGGTATCGCGAGAGTTGGGCGGGCCGAGCAATCGCAGCAGTCTATTCCCTCACTCTCCCTGGAGGAGCCGCTGGCCCTGGACTCTCCAAATTCTGAGCTCTCATCATGGCGGCGGCGGCCGCGGCGGCCGTCGGGGGCCAGCAGCCGTCACAGCCCGAGCTGCCCGCGCCGGGGCTGGCCCTAGACAAGGCGGCCACCGCCGCGCACCTCAAGGCGGCCCTCAGCCGGCCGGACAACCGCGCAGGTGCTGAGGAGCTGCAGGCGCTGCTGGAGCGGGTGCTGAGCGCCGAGCGGCCGCTGGCCGCGGCTGCTGGCGGCGAGGACGCGGCGGCGGCTACGACGAGTTCTGCGCGGCGGTGCGGGCCTACGATCCCGCGGCGCTCTGCGGCCTGGTCTGGACAGCCAACTTCGTGGCCTACCGCTGCCGGACGTGCGGCATCTCGCCCTGCATGTCGCTGTGCGCCGAGTGCTTCCACCAGGGCGACCACACCGGACACGACTTCAACATGTTCCGCAGCCAGGCCGGGGGCGCCTGCGACTGCGGGGACAGCAACGTGATGCGGGAGAGCGGGTGAGTGGAGCCCTCCCCGCGGGCGAGGCGACCCTGGGCCGGGGACGTCGCGGGAGGGCCTGGAGCGGAGCACTGGGAGCCCACTCTGAGCTGTCAAGGGGAGGGTGCGGGGGAGGGTGCAGCCACAGGGGGATGGAGG

and aligns it back to the reference with CIGAR 439M79D386M.

Blast also aligns that contig to the reference with a deletion (see image)
Image

Now, all reads are realigned against the contig. Function remapReads finds that originalEditDist>alginment.numMismatches, because the contig sequence includes the homozygous SNV that is not in the reference sequence. Reads that span the duplication are not remapped as they wouldn't get good alignment scores.

As a result we get incorrectly realigned reads and call a large deletion at ~10% VAF which, given that we are analyzing tumor samples, is too high to simply call it an artefact without manual inspection.

Image

Any ideas on how this behaviour could be supressed, perhaps by changing some assembler parameters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions