Releases · lilab-bcb/cumulus_feature_barcoding

20 May 07:39

yihming

2.0.0

f96ee53

2.0.0 Latest

Latest

Acknowledgment

The work of adding UMI correction feature was initiated by Jack Kamm. And Jack further demonstrated the usefulness of UMI correction + PCR chimeric filtering to clean up the noise in CRISPR data that are deeply sequenced. Here, we would like to give our huge thanks to Jack for his inspiration and help on improving this software!

New Features

Add UMI correction with methods introduced in [Smith, et al. 2017]:
- Use directional method by default. Other methods available: cluster, adjacency. Specify non-default method via --umi-correct-method option.
- New structure of report txt file to include stats after UMI correction.
For crispr feature type data, further perform PCR chimeric filtering:
- No UMI count cutoff by default. Users can specify a non-zero cutoff via --umi-count-cutoff option.
- Chimeric filtering by ratio threshold 0.5 per Barcode+UMI combination by default. Users can specify a non-default cutoff via --read-ratio-cutoff.

Other Important Changes

Ignore UMIs containing N's when processing reads.
If --max-mismatch-feature is non-zero, add mutated indexes in BFS way (previously it's DFS).
- Due BFS way, if the specified --max-mismatch-feature is too high, reset it to a lower mismatch (i.e. the smallest mismatch that encounters ambiguous mutated feature sequences), instead of failure.
Remove --max-mismatch-cell and --umi-length, and make them decided by the chemistry type.
Remove --feature, and make feature type a required input. Available options: hashing, citeseq, cmo, crispr, and adt (when both hashing and citeseq features are in the same sample).
Add --genome option to allow write genome reference name to the output count matrices.

Output Format Changes

Count matrices are in sparse format and in 10x hdf5 format.
UMI tables are in a simplified 10x hdf5 format (.molecule_info.h5), instead of .stat..csv.gz:
- Datasets /barcode_idx and /barcodes: /barcode_idx stores each molecule's cell barcode index, with name found in /barcodes via this index.
- Datasets /feature_idx and /features: /feature_idx stores each molecule's feature index, with name found in /features via this index.
- Dataset /umi: Each molecule's UMI sequence in string.
- Dataset /count: Each molecule's read count in integer.
For crispr samples, 3 count matrices are generated: .raw.h5 for raw count matrix, .umi_correct.h5 for count matrix after UMI correction, .chimeric_filtered.h5 for count matrix after UMI correction + PCR chimeric filtering.
For other antibody type samples, 2 count matrices are generated: .raw.h5 for raw count matrix, .umi_correct.h5 for count matrix after UMI correction.

Bug Fix

Fix a bug of indexing -1 when processing feature barcode files of 3 columns (i.e. contain modality column).
Fix a bug in chemistry auto-detection which leads to rare cases that the software fails to detect low quality samples of top 2 chemistries having similar matched reads in cell barcodes.

Assets 2

05 Mar 00:20

yihming

1.0.0

af148b2

1.0.0

Chemistry auto-detection by testing the first 10,000 R1 reads against all possible cell barcode inclusion lists based on --chemistry:
- Need to put all 10x cell barcode files in one folder, and specified in command required argument cell_barcode_dir.
- Use the new lists for SC3Pv3 and SC3Pv4 chemistries since Cell Ranger v9.0.
Automatically decide totalseq_type (for antibody assays), umi_len, barcode_pos and max_mismatch_cell accordingly.
Remove --convert-cell-barcode option as it will be automatically detected.

Assets 2

05 Feb 20:37

yihming

0.11.4

0eef6d8

0.11.4

Support UTF encoding cell and feature barcode files as input (PR #28 by @yihming )
Early stop if no FASTQ file is found in input directory, with user-friendly error message (PR #28 by @yihming )

Contributors

yihming

Assets 2

11 Mar 19:46

yihming

0.11.3

f249353

0.11.3

Fix an issue on parsing feature barcode file with multiple modalities (PR #27 by @yihming )

Contributors

yihming

Assets 2

16 Oct 18:50

yihming

0.11.2

a7481b0

0.11.2

Fix whitespace issue with Windows (PR #26 by @yihming )

Contributors

yihming

Assets 2

16 Nov 06:02

yihming

0.11.1

be61532

0.11.1

Fix bug in ingesting reads (PR #25 by @bli25 )

Contributors

bli25

Assets 2

18 Aug 18:28

yihming

0.11.0

b483047

0.11.0

This release contains the following changes (PR #24 by @bli25 ) :

Add support on writing in BGZF format.
Bug fix in izlib.h and improved error message.

Contributors

bli25

Assets 2

22 Jun 19:53

yihming

0.10.0

09cecb1

0.10.0

Fastq file reading parser:

Achieve faster multi-threaded fastq file reading by a simplified reimplementation of FQFeeder (PR #23 by @bli25 )

Contributors

bli25

Assets 2

12 Jun 09:48

bli25

0.9.0

861bbfb

0.9.0

Decompressing:
- Use isa-l to replace zlib for faster decompression
- Use slw287r's izlib.h as the interface to interact with kseq.h. (PR #20 PR #21 by @bli25 )
Compressing:
- Use libdeflate for faster compression
- Add compress.hpp that enable single-threaded and multi-threaded compression (PR #22 by @bli25 )
In input arguments:
- Accept gzipped cell barcode file again
- Add -p option for multi-threaded compression
In output:
- The sufficient statistics file is gzipped again, i.e.output_name.stat.csv.gz.

Contributors

bli25

Assets 2

30 Apr 23:56

yihming

0.8.0

2c6ed55

0.8.0

On processing gzipped FASTQ files:
- Remove boost library dependency.
- Instead, use zlib and Heng Li's kseq library for fast I/O processing. (PR #17 by @tony-kuo ; PR #18 by @bli25 )
In input arguments:
- No longer accept gzipped cell barcode file, i.e. only .txt format is accepted.
In output:
- The sufficient statistics file output_name.stat.csv is no longer gzipped, but in .csv format.
- Add output_name.report.txt to report statistics related to number of reads.

Contributors

bli25 and tony-kuo

Assets 2

Releases: lilab-bcb/cumulus_feature_barcoding

2.0.0

Acknowledgment

New Features

Other Important Changes

Output Format Changes

Bug Fix

Uh oh!

1.0.0

Uh oh!

0.11.4

Contributors

Uh oh!

0.11.3

Contributors

Uh oh!

0.11.2

Contributors

Uh oh!

0.11.1

Contributors

Uh oh!

0.11.0

Contributors

Uh oh!

0.10.0

Contributors

Uh oh!

0.9.0

Contributors

Uh oh!

0.8.0

Contributors

Uh oh!