-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
This proposal outlines additional association types that could be generated from phenopacket data to better capture the variant-level relationships and their context.
Current Associations
Currently we generate:
- CaseToPhenotypicFeatureAssociation (168,288 edges)
- CaseToDiseaseAssociation (8,207 edges)
- CaseToGeneAssociation (8,138 edges) - newly fixed
Proposed New Associations
1. CaseToVariantAssociation
- Subject: Case (phenopacket.store:PMID_xxx)
- Object: Variant ID
- Properties: zygosity, interpretation_status
- Impact: ~8,207 associations (all records have variants)
2. VariantToGeneAssociation
- Subject: Variant ID
- Object: Gene ID (HGNC:xxx)
- Properties: interpretation_status
- Impact: ~8,138 associations
3. VariantToDiseaseAssociation
- Subject: Variant ID
- Object: Disease ID (OMIM:xxx)
- Properties: interpretation_status (CAUSATIVE, CONTRIBUTING)
- Impact: ~8,207 associations
Capturing Phenopacket Context on Non-Case Associations
For associations that don't directly involve Case nodes (VariantToGene, VariantToDisease), we need to preserve provenance back to the source case/phenopacket.
Existing Biolink Context Qualifiers (pattern)
disease_context_qualifier- "A context qualifier representing a disease or condition in which a relationship expressed in an association took place"anatomical_context_qualifier- for anatomical locationsspecies_context_qualifier- for taxonomic speciespopulation_context_qualifier- for population context
Gap Identified
No case_context_qualifier or individual_context_qualifier exists in biolink-model.
Proposal Options
-
Propose new slot to biolink-model (Recommended)
- Add
case_context_qualifier: Optional[str] - Description: "A context qualifier representing a case or individual in which a relationship expressed in an association was observed"
- Would allow:
case_context_qualifier: "phenopacket.store:PMID_xxx_yyy"
- Add
-
Use generic
qualifierslist (fallback)- Available on all associations
- Less semantic but works today
qualifiers: ["case:phenopacket.store:PMID_xxx_yyy"]
Additional Qualifier Usage
- interpretation_status (CAUSATIVE, CONTRIBUTING) → Use as
statement_qualifieror dedicated property - zygosity → association property (exists on some variant associations)
- onset_qualifier → already implemented for phenotypes/diseases
Data Available
From 8,207 phenopacket records:
- 100% have variant data with interpretation status
- 99% of variants have associated gene IDs (HGNC)
- 100% have disease associations
Next Steps
- Decide on approach for
case_context_qualifier(propose to biolink-model or use fallback) - Implement CaseToVariantAssociation transformer
- Implement VariantToGeneAssociation transformer
- Implement VariantToDiseaseAssociation transformer
- Update transform.yaml with new edge_properties
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels