Alignment and reference sequence from CIGAR and MD by ingolia · Pull Request #110 · rust-bio/rust-htslib

ingolia · 2018-09-02T23:24:56Z

This is a module that uses CIGAR and MD fields to construct alignments and reconstruct reference sequences from BAM records. I found myself wanting to do this repeatedly, and it isn't actually straightforward. I think these will be generally useful, based on the number of people requesting this feature in various languages.

This code features an MD field parser, a minimal alignment position type that includes only the information directly present in the CIGAR + MD fields, and more "complete" alignment types that use the read sequence to provide complete read and reference sequences directly in the alignment. These are all structured around iterators that generate individual positions, and can easily be collected into a vector if needed. I also added a function to create a bio-type Alignment.

…ecords

johanneskoester

Thanks! I think the API could be simplified, see below.

src/bam/md_align.rs

…all other iterators & position representations. Updated record interface accordingly.

y9c · 2021-08-09T01:26:02Z

Hi @johanneskoester

I would like to know if there is a function to get alignment pair now. In other similar htslib bindings, the reference sequence with the query sequence can be fetch in the same time.

For example, https://github.com/blachlylab/dhtslib/blob/11be3debdce9feda903b59ddab6fb737dfd9d3fa/source/dhtslib/sam/record.d#L527-L531

ArtRand · 2022-12-10T15:25:15Z

Is there any particular blocker for this work? I'd be happy to help get it over the line if necessary.

johanneskoester

Sorry for the long silence and thanks a lot! Looks good to me in principle (see below)

johanneskoester · 2024-11-12T14:05:34Z

src/bam/md_align.rs

+quick_error! {
+    #[derive(Debug,Clone)]
+    pub enum MDAlignError {
+        NoMD {
+            description("no MD aux field")
+        }
+        BadMD {
+            description("bad MD value")
+        }
+        MDvsCIGAR {
+            description("MD inconsistent with CIGAR")
+        }
+        BadSeqLen {
+            description("Sequence/quality length inconsistent with MD/CIGAR")
+        }
+        EmptyAlign {
+            description("Alignment has no positions")
+        }
+        ParseInt(err: ::std::num::ParseIntError) {
+            from()
+        }
+        Utf8(err: ::std::str::Utf8Error) {
+            from()
+        }
+    }
+}


We've moved to thiseeror, hence, this should be adapted and moved into errors.rs.

ingolia added 3 commits September 2, 2018 16:17

Use CIGAR field and MD aux field to reconstruct alignments from BAM r…

07bbab5

…ecords

rustfmt

2704703

Unmangle table

0e98f8c

johanneskoester requested changes Sep 14, 2018

View reviewed changes

src/bam/md_align.rs Show resolved Hide resolved

ingolia added 13 commits September 23, 2018 10:47

Moved MD accessor convenience function onto record

93c268a

Fix up tests and fmt

8e35160

Simplified Cigar/MD reconstruction interface

5d19951

Reference sequence reconstruction method on Record

5ece794

Better MDString newtype using standard traits

1a0ee99

Use Iter over MatchDesc rather than reified vector

32086f0

Generic over MatchDesc iterator type

401c58c

Moved to iterator access to Cigar information

c419e6a

fmt

27939c6

Moved all read sequence lookup functions onto CigarMDPos and removed …

c4510e5

…all other iterators & position representations. Updated record interface accordingly.

fmt

a6d8fcf

Collecting into an Alignment now goes through FromIterator

1d14ba7

Alignment reconstruction on the Record

8a5e3ac

Merge branch 'master' into master

5072011

johanneskoester requested changes Nov 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alignment and reference sequence from CIGAR and MD#110

Alignment and reference sequence from CIGAR and MD#110
ingolia wants to merge 17 commits intorust-bio:masterfrom
ingolia:master

ingolia commented Sep 2, 2018

Uh oh!

johanneskoester left a comment

Uh oh!

Uh oh!

y9c commented Aug 9, 2021

Uh oh!

ArtRand commented Dec 10, 2022

Uh oh!

johanneskoester left a comment

Uh oh!

johanneskoester Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ingolia commented Sep 2, 2018

Uh oh!

johanneskoester left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

y9c commented Aug 9, 2021

Uh oh!

ArtRand commented Dec 10, 2022

Uh oh!

johanneskoester left a comment

Choose a reason for hiding this comment

Uh oh!

johanneskoester Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants