Skip to content

A pipeline for filtering annotated variant call format files

License

Notifications You must be signed in to change notification settings

scholl-lab/variantcentrifuge

Repository files navigation

VariantCentrifuge

CI Docker Docs Release Python License

A command-line tool for filtering, extracting, and prioritizing genetic variants from VCF files. VariantCentrifuge combines gene-centric region extraction, multi-tier filtering (bcftools, SnpSift, pandas), inheritance analysis, and configurable scoring into a single reproducible pipeline.

Features

  • Gene-centric variant extraction using gene names or BED regions
  • Three-tier filtering: bcftools prefilter, SnpSift expressions, pandas final filter
  • Inheritance pattern analysis (de novo, AD, AR, X-linked, compound het)
  • Configurable variant scoring models
  • Gene burden analysis with Fisher's exact test
  • Interactive HTML reports with sortable tables and IGV.js integration
  • ClinVar, gnomAD, and SpliceAI annotation links
  • Cohort aggregation across multiple samples
  • Field profiles for switching annotation database versions (e.g., dbNSFP v4/v5)
  • Docker image with all bioinformatics dependencies included
  • Stage-based pipeline architecture with parallel execution (--use-new-pipeline)

Installation

Docker (recommended -- all tools included):

docker pull ghcr.io/scholl-lab/variantcentrifuge:latest

pip:

pip install variantcentrifuge

From source:

git clone https://github.com/scholl-lab/variantcentrifuge.git
cd variantcentrifuge && pip install .

External tools (bcftools, snpEff, SnpSift, bedtools) must be in PATH when not using Docker. Install via conda: mamba create -y -n vc bcftools snpsift snpeff bedtools

Quick Start

# Filter rare coding variants in a gene list
variantcentrifuge \
  --gene-file genes.txt \
  --vcf-file input.vcf.gz \
  --preset rare,coding \
  --html-report \
  --xlsx

# Score and filter with a custom model
variantcentrifuge \
  --gene-file genes.txt \
  --vcf-file input.vcf.gz \
  --preset rare,coding \
  --scoring-config-path scoring/nephro_variant_score \
  --final-filter 'score > 0.8 and IMPACT == "HIGH"' \
  --output-file results.tsv

Snakemake Workflow

A Snakemake 8+ workflow for batch-processing multiple VCFs on HPC clusters (SLURM, PBS) is included under workflow/, with cluster profiles in profiles/ and sample configuration in config/. See scripts/run_snakemake.sh for the auto-detecting launcher.

Documentation

Full documentation: scholl-lab.github.io/variantcentrifuge

Contributing

Contributions are welcome. Please see the Contributing Guide for details.

Citation

If you use VariantCentrifuge in your research, please cite:

Citation information will be added upon publication.

License

This project is licensed under the MIT License.

About

A pipeline for filtering annotated variant call format files

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages