-
Notifications
You must be signed in to change notification settings - Fork 9
Description
We have run into some cases that present a similar phenotype: A patient has a homozygous SNV upstream of a genomic duplication. Abra uses in the region to perform contig assembly and seems to collapse the genomic duplication in its assembly. Then it realigns all reads against the new contig. Since the homozygous SNV is part of the contig sequence, the reads have a lower edit distance mapping to the contig than to the reference, provided they don't span the genomic duplication.
One example: (hg19) chr2:170684000-170684500, inside the UBR3 gene.
Our patient has a homozygous SNV at chr2:170,684,224C>T.
A few bp downsteam at chr2:170,684,237 the sequence is GCGGCGGCGG
Further downstream at chr2:170,684,316 the sequence is GCGGCGGCGG as well.
Abra now creates a contig with sequence TTGGGAGCCTAGCTTGGTCCACAGGCGCCCAGGAGAGAGGGGCGGGAGGAAGGCTCTGCAGCCCGAGGGGGCGTGTGTAGGGGCGGGGCTGCGGGCGGAGGAGCGCGGACGCTCCGGGTATCGCGAGAGTTGGGCGGGCCGAGCAATCGCAGCAGTCTATTCCCTCACTCTCCCTGGAGGAGCCGCTGGCCCTGGACTCTCCAAATTCTGAGCTCTCATCATGGCGGCGGCGGCCGCGGCGGCCGTCGGGGGCCAGCAGCCGTCACAGCCCGAGCTGCCCGCGCCGGGGCTGGCCCTAGACAAGGCGGCCACCGCCGCGCACCTCAAGGCGGCCCTCAGCCGGCCGGACAACCGCGCAGGTGCTGAGGAGCTGCAGGCGCTGCTGGAGCGGGTGCTGAGCGCCGAGCGGCCGCTGGCCGCGGCTGCTGGCGGCGAGGACGCGGCGGCGGCTACGACGAGTTCTGCGCGGCGGTGCGGGCCTACGATCCCGCGGCGCTCTGCGGCCTGGTCTGGACAGCCAACTTCGTGGCCTACCGCTGCCGGACGTGCGGCATCTCGCCCTGCATGTCGCTGTGCGCCGAGTGCTTCCACCAGGGCGACCACACCGGACACGACTTCAACATGTTCCGCAGCCAGGCCGGGGGCGCCTGCGACTGCGGGGACAGCAACGTGATGCGGGAGAGCGGGTGAGTGGAGCCCTCCCCGCGGGCGAGGCGACCCTGGGCCGGGGACGTCGCGGGAGGGCCTGGAGCGGAGCACTGGGAGCCCACTCTGAGCTGTCAAGGGGAGGGTGCGGGGGAGGGTGCAGCCACAGGGGGATGGAGG
and aligns it back to the reference with CIGAR 439M79D386M.
Blast also aligns that contig to the reference with a deletion (see image)

Now, all reads are realigned against the contig. Function remapReads finds that originalEditDist>alginment.numMismatches, because the contig sequence includes the homozygous SNV that is not in the reference sequence. Reads that span the duplication are not remapped as they wouldn't get good alignment scores.
As a result we get incorrectly realigned reads and call a large deletion at ~10% VAF which, given that we are analyzing tumor samples, is too high to simply call it an artefact without manual inspection.
Any ideas on how this behaviour could be supressed, perhaps by changing some assembler parameters?
