Releases: lilab-bcb/cumulus_feature_barcoding
Releases · lilab-bcb/cumulus_feature_barcoding
2.0.0
Acknowledgment
The work of adding UMI correction feature was initiated by Jack Kamm. And Jack further demonstrated the usefulness of UMI correction + PCR chimeric filtering to clean up the noise in CRISPR data that are deeply sequenced. Here, we would like to give our huge thanks to Jack for his inspiration and help on improving this software!
New Features
- Add UMI correction with methods introduced in [Smith, et al. 2017]:
- Use
directionalmethod by default. Other methods available:cluster,adjacency. Specify non-default method via--umi-correct-methodoption. - New structure of report txt file to include stats after UMI correction.
- Use
- For
crisprfeature type data, further perform PCR chimeric filtering:- No UMI count cutoff by default. Users can specify a non-zero cutoff via
--umi-count-cutoffoption. - Chimeric filtering by ratio threshold
0.5per Barcode+UMI combination by default. Users can specify a non-default cutoff via--read-ratio-cutoff.
- No UMI count cutoff by default. Users can specify a non-zero cutoff via
Other Important Changes
- Ignore UMIs containing
N's when processing reads. - If
--max-mismatch-featureis non-zero, add mutated indexes in BFS way (previously it's DFS).- Due BFS way, if the specified
--max-mismatch-featureis too high, reset it to a lower mismatch (i.e. the smallest mismatch that encounters ambiguous mutated feature sequences), instead of failure.
- Due BFS way, if the specified
- Remove
--max-mismatch-celland--umi-length, and make them decided by the chemistry type. - Remove
--feature, and make feature type a required input. Available options:hashing,citeseq,cmo,crispr, andadt(when bothhashingandciteseqfeatures are in the same sample). - Add
--genomeoption to allow write genome reference name to the output count matrices.
Output Format Changes
- Count matrices are in sparse format and in 10x hdf5 format.
- UMI tables are in a simplified 10x hdf5 format (
.molecule_info.h5), instead of.stat..csv.gz:- Datasets
/barcode_idxand/barcodes:/barcode_idxstores each molecule's cell barcode index, with name found in/barcodesvia this index. - Datasets
/feature_idxand/features:/feature_idxstores each molecule's feature index, with name found in/featuresvia this index. - Dataset
/umi: Each molecule's UMI sequence in string. - Dataset
/count: Each molecule's read count in integer.
- Datasets
- For
crisprsamples, 3 count matrices are generated:.raw.h5for raw count matrix,.umi_correct.h5for count matrix after UMI correction,.chimeric_filtered.h5for count matrix after UMI correction + PCR chimeric filtering. - For other antibody type samples, 2 count matrices are generated:
.raw.h5for raw count matrix,.umi_correct.h5for count matrix after UMI correction.
Bug Fix
- Fix a bug of indexing
-1when processing feature barcode files of 3 columns (i.e. contain modality column). - Fix a bug in chemistry auto-detection which leads to rare cases that the software fails to detect low quality samples of top 2 chemistries having similar matched reads in cell barcodes.
1.0.0
- Chemistry auto-detection by testing the first 10,000 R1 reads against all possible cell barcode inclusion lists based on
--chemistry:- Need to put all 10x cell barcode files in one folder, and specified in command required argument
cell_barcode_dir. - Use the new lists for
SC3Pv3andSC3Pv4chemistries since Cell Ranger v9.0.
- Need to put all 10x cell barcode files in one folder, and specified in command required argument
- Automatically decide
totalseq_type(for antibody assays),umi_len,barcode_posandmax_mismatch_cellaccordingly. - Remove
--convert-cell-barcodeoption as it will be automatically detected.
0.11.4
0.11.3
0.11.2
0.11.1
0.11.0
0.10.0
0.9.0
- Decompressing:
- Compressing:
- In input arguments:
- Accept gzipped cell barcode file again
- Add -p option for multi-threaded compression
- In output:
- The sufficient statistics file is gzipped again, i.e.
output_name.stat.csv.gz.
- The sufficient statistics file is gzipped again, i.e.
0.8.0
- On processing gzipped FASTQ files:
- In input arguments:
- No longer accept gzipped cell barcode file, i.e. only
.txtformat is accepted.
- No longer accept gzipped cell barcode file, i.e. only
- In output:
- The sufficient statistics file
output_name.stat.csvis no longer gzipped, but in.csvformat. - Add
output_name.report.txtto report statistics related to number of reads.
- The sufficient statistics file