Privacy-first local DNA analysis for AI agents. Comprehensive genetic analysis with 1600+ validated markers across 30 categories covering pharmacogenomics, disease risk, carrier status, haplogroups, ancestry, hereditary cancer, autoimmune conditions, pain sensitivity, lifestyle optimization, and more.
NEW in v5.0.0: Reference Dataset Integration!
9 major genomics reference databases integrated for validated analysis:
| Dataset | Type | Coverage |
|---|---|---|
| 1000 Genomes | Population | 26 populations, ~2,500 individuals |
| HGDP | Population | 51 populations, ~1,000 individuals |
| SGDP | Population | 142 populations, 300 individuals (deep sequencing) |
| Ancient DNA | Ancestry | WHG, Neolithic, Yamnaya, Neanderthal markers |
| gnomAD | Allele Frequencies | 76,000+ genomes |
| ClinVar | Clinical Variants | Pathogenicity classifications |
| PharmGKB | Pharmacogenomics | Drug-gene interactions |
| PGS Catalog | Risk Scores | Polygenic score coefficients |
| GWAS Catalog | Trait Associations | Published GWAS results |
All data downloaded and cached locally for privacy-first analysis.
- 1600+ validated genetic markers across 30 categories
- Polygenic Risk Scores (PRS) for 10+ major conditions with population calibration
- Pharmacogenomics with CPIC Level 1A drug-gene interactions
- Medication Interaction Checker - cross-reference any medication list
- Carrier screening for 35+ recessive diseases including rare diseases
- VCF support for whole genome/exome sequencing
- Agent-friendly JSON output with priority-sorted actionable items
- Zero network requests - all analysis runs locally
- Universal ethnic coverage - works for all ancestries worldwide
🧬 Haplogroup Analysis
- Mitochondrial DNA (mtDNA) haplogroups for maternal lineage
- Y-chromosome haplogroups for paternal lineage
- Migration history context for each haplogroup
- Based on PhyloTree and ISOGG standards
🌍 Ancestry Composition
- Reference population comparisons (EUR, AFR, EAS, SAS, AMR)
- Admixture detection from SNP data
- Ancestry informative markers (AIMs)
- Sub-population resolution where data supports
🎗️ Expanded Hereditary Cancer Panel
- BRCA1/BRCA2 comprehensive coverage
- Lynch syndrome genes (MLH1, MSH2, MSH6, PMS2)
- Other hereditary cancer markers (APC, TP53, CHEK2, PALB2, ATM)
- ACMG-style variant classification
🔬 Autoimmune HLA Associations
- Celiac disease (HLA-DQ2, DQ8) with rule-out capability
- Type 1 Diabetes associations
- Ankylosing spondylitis (HLA-B27)
- Rheumatoid arthritis, lupus, multiple sclerosis markers
💊 Pain Sensitivity
- COMT Val158Met (pain perception, opioid response)
- OPRM1 A118G (opioid receptor function)
- SCN9A (pain signaling)
- TRPV1 (capsaicin/thermal sensitivity)
- Migraine susceptibility markers
📄 PDF Report Generation
- Professional, physician-shareable format
- Executive summary section
- Detailed findings by category
- Actionable recommendations
- Disclaimers and limitations
📊 Data Quality Metrics
- Call rate analysis
- No-call position tracking
- Chromosome coverage analysis
- Platform/chip detection
- Confidence scoring for variants
🔗 Integration & Export
- Genetic counselor clinical export (ACMG-style)
- Apple Health compatible format
- API-ready JSON structure
- Integration hooks for health trackers
💊 Medication Interaction Checker
- Input any list of medications (generic or brand names)
- Cross-references with pharmacogenomics profile
- Critical, serious, moderate, and minor severity levels
- Dosing adjustments and alternative medication suggestions
- PubMed citations for each interaction
- FDA warning flags
🌙 Sleep Optimization Profile
- Chronotype determination (CLOCK, PER2, PER3 genes)
- Caffeine metabolism speed (CYP1A2)
- Adenosine receptor sensitivity (ADORA2A)
- Personalized wake/sleep time recommendations
- Coffee cutoff time based on genetics
- Short sleeper gene detection
🥗 Dietary Interaction Matrix
- Caffeine tolerance (CYP1A2)
- Alcohol metabolism (ADH1B, ALDH2 - flush detection)
- Saturated fat response (APOE-specific recommendations)
- Lactose tolerance (LCT)
- Celiac disease risk (HLA-DQ2/DQ8)
- Bitter taste perception (TAS2R38)
- Omega-3 conversion efficiency (FADS1)
- Iron overload risk (HFE)
🖥️ Interactive Web Dashboard
- Beautiful, responsive HTML visualization
- Auto-generated with every analysis
- No external dependencies - works offline
- Dark mode support
- Sections: Overview, Pharmacogenomics, Health Risks, Traits, Ancestry, Carrier Status, Sleep, Athletic, UV/Skin, Dietary
- Export to PDF (print functionality)
- Drag & drop JSON loading
📊 Reference Dataset Integration
9 major genomics databases integrated for validated, evidence-based analysis:
Population/Ancestry:
- 1000 Genomes - 26 populations from 5 superpopulations (~2,500 individuals)
- HGDP - Human Genome Diversity Project (51 populations, ~1,000 individuals)
- SGDP - Simons Genome Diversity Project (142 populations, 300 individuals, 43x deep sequencing)
- Ancient DNA - Curated markers for ancestral population signals
Health/Clinical:
- gnomAD - Genome Aggregation Database (76,000+ genomes for allele frequencies)
- ClinVar - Clinical variant interpretations with pathogenicity classifications
- PharmGKB - Pharmacogenomics knowledge base with dosing guidelines
- PGS Catalog - Polygenic risk score coefficients
- GWAS Catalog - Published genome-wide association results
🏛️ Ancient Ancestral Signals
Replace misleading "ethnicity percentages" with honest ancient population signals:
- Western Hunter-Gatherers (WHG) - Pre-Neolithic Europeans (~14,000-5,000 BCE)
- Eastern Hunter-Gatherers (EHG) - Eastern European/Siberian
- Neolithic Farmers - Anatolian agricultural migrants
- Yamnaya/Steppe - Bronze Age pastoralists from Pontic-Caspian steppe
- Neanderthal Introgression - Archaic admixture (~2% in non-Africans)
- Denisovan Introgression - Archaic admixture (East Asian/Oceanian)
Each signal includes:
- Marker count and derived allele frequency
- Phenotypic evidence (eye color, skin pigmentation, lactase persistence)
- Confidence level
- Published references (PMIDs)
🛡️ Code Quality Improvements
- Complete type hints throughout codebase
- TypedDict for complex return types
- Comprehensive docstrings (Google style)
- Defensive programming with input validation
- Graceful handling of malformed data
- User-friendly error messages
- 200+ automated tests
📊 Enhanced Visualizations
- PRS percentile bars with color coding
- Power vs Endurance athletic gauge
- Sleep chronotype display
- Skin type estimation visuals
- Dietary interaction matrix
- Collapsible sections
- Search/filter functionality
🏃 Athletic Performance Profiling
- Endurance vs power composite score
- Key markers: ACTN3, ACE, PPARGC1A
- Recovery profile (TNF, IL6, BDNF)
- Injury susceptibility (COL5A1, COL1A1, GDF5)
- VO2max potential indicators
- Sport suitability recommendations
- Personalized training guidance
☀️ UV Sensitivity Calculator
- Estimated Fitzpatrick skin type from genetics
- MC1R, SLC24A5, SLC45A2, IRF4, TYR markers
- SPF recommendations by skin type
- Melanoma risk assessment
- Vitamin D synthesis capacity
- Sun exposure guidelines
📝 Natural Language Explanations
- Plain-English interpretation of all findings
- Jargon-free explanations
- Calibrated uncertainty language
- Practical implications for each finding
- Context for relative risks
🔬 Research Variant Flagging
- Clear separation of "established" vs "emerging" findings
- Evidence level classification for each marker
- Research context for preliminary findings
- Appropriate uncertainty communication
📚 PubMed References
- Direct links to source papers for major findings
- Primary citations for CPIC guidelines
- PMID references throughout
🧬 Runs of Homozygosity Detection
- Heterozygosity rate calculation
- Sensitive handling of consanguinity findings
- Genetic counselor referral recommendations
⏳ Telomere Length Estimation
- TERT, TERC, OBFC1 variants
- Longevity-associated markers (FOXO3, APOE)
- Clear caveats about limitations
- Lifestyle factor context
- 23andMe (v3, v4, v5)
- AncestryDNA
- MyHeritage
- FamilyTreeDNA
- Nebula Genomics
- VCF files (whole genome/exome, gzipped supported)
- Any tab-delimited rsid format
# Install via clawhub (recommended)
clawhub install personal-genomics
# Or clone directly
git clone https://github.com/wkyleg/personal-genomics.git
cd personal-genomics
pip install -r requirements.txtpython comprehensive_analysis.py /path/to/dna_file.txtAnalyze my DNA file at ~/Downloads/genome.txt
python -c "
from comprehensive_analysis import load_dna_file, main
from pdf_report import generate_pdf_report
# Run main analysis, then generate PDF from results
"Reports are saved to ~/dna-analysis/reports/:
dashboard.html- NEW! Interactive visualization dashboardagent_summary.json- AI-optimized output with priority-sorted actionable itemsfull_analysis.json- Complete analysis datareport.txt- Human-readable reportgenetic_report.pdf- Professional PDF report
from exports import export_all_formats, generate_genetic_counselor_export
# Export all formats
paths = export_all_formats(analysis_results)
# Clinical export for genetic counselors
clinical = generate_genetic_counselor_export(analysis_results)| Category | Markers | Description |
|---|---|---|
| Pharmacogenomics | 159 | Drug metabolism (CYP450, DPYD, TPMT, HLA) |
| Polygenic Risk | 277 | Disease risk scores (CAD, T2D, cancer, etc.) |
| Carrier Status | 181 | Recessive disease carriers (CF, sickle cell, Tay-Sachs) |
| Health Risks | 233 | Disease susceptibility (APOE, Factor V, AMD) |
| Traits | 163 | Physical, sensory, behavioral traits |
| Haplogroups | 44 | mtDNA + Y-DNA lineage markers |
| Ancestry AIMs | 124 | Ancestry informative markers |
| Hereditary Cancer | 41 | BRCA, Lynch syndrome, other cancer genes |
| Autoimmune HLA | 31 | HLA disease associations |
| Pain Sensitivity | 20 | Pain perception, opioid response |
| Rare Diseases | 29 | Rare genetic conditions |
| Mental Health | 25 | Psychiatric genetics |
| Dermatology | 37 | Skin conditions |
| Vision & Hearing | 33 | Sensory conditions |
| Fertility | 31 | Reproductive health |
| Nutrition | 34 | Nutrigenomics |
| Fitness | 30 | Athletic performance |
| Neurogenetics | 28 | Cognition, behavior |
| Longevity | 30 | Aging markers |
| Immunity | 43 | HLA, autoimmunity |
The agent_summary.json is designed for AI agents to quickly identify what matters:
{
"critical_alerts": [...],
"high_priority": [...],
"medium_priority": [...],
"pharmacogenomics_alerts": [...],
"apoe_status": {
"genotype": "ε3/ε4",
"risk_level": "elevated",
"recommendations": [...]
},
"polygenic_risk_scores": {
"cad": {"percentile_estimate": 75, "confidence": "moderate"},
"t2d": {"percentile_estimate": 42, "confidence": "moderate"}
},
"haplogroups": {
"mtDNA": {"haplogroup": "H", "confidence": "high", "lineage": "maternal"},
"Y_DNA": {"haplogroup": "R1b", "confidence": "high", "lineage": "paternal"}
},
"ancestry": {
"primary": "European",
"composition": {"European": 85, "Middle Eastern": 10, "Other": 5}
},
"lifestyle_recommendations": {
"diet": [...],
"exercise": [...],
"supplements": [...],
"avoid": [...]
},
"drug_interaction_matrix": {
"critical_interactions": [...],
"warnings": [...],
"dosing_adjustments": [...]
}
}from markers.haplogroups import analyze_haplogroups
result = analyze_haplogroups(genotypes)
# Returns maternal (mtDNA) and paternal (Y-DNA) lineage with historyfrom markers.ancestry_composition import get_ancestry_summary
result = get_ancestry_summary(genotypes)
# Returns population composition estimates and admixture detectionfrom markers.cancer_panel import analyze_cancer_panel
result = analyze_cancer_panel(genotypes)
# Returns pathogenic/likely pathogenic variants with ACMG classificationfrom markers.autoimmune_hla import analyze_autoimmune_risk
result = analyze_autoimmune_risk(genotypes)
# Returns HLA-based autoimmune disease susceptibilityfrom markers.pain_sensitivity import analyze_pain_sensitivity
result = analyze_pain_sensitivity(genotypes)
# Returns pain perception profile and opioid response expectationsfrom data_quality import generate_quality_report
result = generate_quality_report(genotypes)
# Returns call rate, platform detection, quality warnings| Gene | Drugs Affected | Clinical Impact |
|---|---|---|
| DPYD | 5-FU, capecitabine | Fatal toxicity risk |
| HLA-B*5701 | Abacavir | Hypersensitivity |
| HLA-B*1502 | Carbamazepine | Stevens-Johnson Syndrome |
| MT-RNR1 | Aminoglycosides | Irreversible deafness |
| CYP2D6 | Codeine, tramadol | Prodrug activation |
| CYP2C19 | Clopidogrel (Plavix) | Platelet inhibition |
| SLCO1B1 | Simvastatin | Myopathy risk |
| Syndrome | Genes | Key Cancers |
|---|---|---|
| HBOC | BRCA1, BRCA2 | Breast, ovarian, prostate, pancreatic |
| Lynch | MLH1, MSH2, MSH6, PMS2 | Colorectal, endometrial |
| FAP | APC | Colorectal (polyposis) |
| Li-Fraumeni | TP53 | Multiple cancers |
| Cowden | PTEN | Breast, thyroid, endometrial |
| Condition | Key HLA | Odds Ratio |
|---|---|---|
| Celiac disease | DQ2.5 | ~7.0 |
| Ankylosing spondylitis | B27 | ~69 |
| Type 1 diabetes | DR3-DQ2, DR4-DQ8 | 3-4 |
| Rheumatoid arthritis | Shared epitope | ~2.5 |
| Multiple sclerosis | DRB1*15:01 | ~3.0 |
# Install test dependencies
pip install pytest
# Run all tests
pytest tests/ -v119 comprehensive tests covering all marker modules, VCF parsing, new v4.0 features, and edge cases.
- All analysis runs locally - no data leaves your machine
- No network requests - completely offline capable
- No tracking or telemetry
- Your genetic data stays yours
- Not diagnostic - Results are informational, not medical diagnoses
- Array limitations - Consumer arrays capture ~0.1% of genome; rare variants often missed
- Probabilistic - Polygenic scores indicate risk, not certainty
- Environment matters - Most conditions are 50-80% non-genetic
- Population effects - Some markers better validated in European ancestry
- No structural variants - CNVs and large deletions not detected
- Haplogroups - Full resolution requires specialized testing or WGS
- Cancer panel - Negative result does NOT rule out hereditary cancer
Contributions welcome! Please ensure new markers include:
- rsID
- Gene name
- Risk/effect allele
- Evidence level (strong/moderate/preliminary)
- Source citation (PMID or ClinVar ID)
- Actionable recommendations (if applicable)
- PharmGKB - Pharmacogenomics annotations
- ClinVar - Clinical variant interpretations
- NHGRI-EBI GWAS Catalog - Genome-wide associations
- CPIC - Clinical Pharmacogenetics Implementation Consortium
- PGS Catalog - Polygenic Score database
- OMIM - Rare disease genetics
- PhyloTree - mtDNA haplogroup tree
- ISOGG - Y-DNA haplogroup tree
- 1000 Genomes - Ancestry reference populations
MIT License - See LICENSE file for details.
Disclaimer: This tool is for educational and research purposes. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult qualified healthcare providers for medical decisions. Critical pharmacogenomic findings and hereditary cancer results should be confirmed with clinical-grade testing before making treatment decisions. Genetic counseling is strongly recommended for any significant findings.