Skip to content

Order & Orientation of Sequences matters for DeNovo Assembler #42

@robinemig

Description

@robinemig

I've seen a situation pop up numerous times where the length of the final consensus sequence when calling the following code, changes, if the sequences are ordered from longest to shortest and vice versa, or if the orientation of the sequences changes
Bio.Algorithms.Assembly.OverlapDeNovoAssembler assem = new Bio.Algorithms.Assembly.OverlapDeNovoAssembler();
assem.OverlapAlgorithm.GapOpenCost = -10;
assem.OverlapAlgorithm.GapExtensionCost = -2;
assem.OverlapAlgorithm.SimilarityMatrix = new SimilarityMatrix(SimilarityMatrix.StandardSimilarityMatrix.AmbiguousDna);
var assembly = assem.Assemble(reads) as Bio.Algorithms.Assembly.OverlapDeNovoAssembly;

compare by assembly.Contigs.First().Consensus.Count
Ive tried to make some simulated data to provide a test case, but can't seem to find one that works.
but I can verify it does this with as little as two sequences

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions