Skip to content

Proposal: Add Variant-centric Associations to Capture Phenopacket Context #5

@kevinschaper

Description

@kevinschaper

Summary

This proposal outlines additional association types that could be generated from phenopacket data to better capture the variant-level relationships and their context.

Current Associations

Currently we generate:

  • CaseToPhenotypicFeatureAssociation (168,288 edges)
  • CaseToDiseaseAssociation (8,207 edges)
  • CaseToGeneAssociation (8,138 edges) - newly fixed

Proposed New Associations

1. CaseToVariantAssociation

  • Subject: Case (phenopacket.store:PMID_xxx)
  • Object: Variant ID
  • Properties: zygosity, interpretation_status
  • Impact: ~8,207 associations (all records have variants)

2. VariantToGeneAssociation

  • Subject: Variant ID
  • Object: Gene ID (HGNC:xxx)
  • Properties: interpretation_status
  • Impact: ~8,138 associations

3. VariantToDiseaseAssociation

  • Subject: Variant ID
  • Object: Disease ID (OMIM:xxx)
  • Properties: interpretation_status (CAUSATIVE, CONTRIBUTING)
  • Impact: ~8,207 associations

Capturing Phenopacket Context on Non-Case Associations

For associations that don't directly involve Case nodes (VariantToGene, VariantToDisease), we need to preserve provenance back to the source case/phenopacket.

Existing Biolink Context Qualifiers (pattern)

  • disease_context_qualifier - "A context qualifier representing a disease or condition in which a relationship expressed in an association took place"
  • anatomical_context_qualifier - for anatomical locations
  • species_context_qualifier - for taxonomic species
  • population_context_qualifier - for population context

Gap Identified

No case_context_qualifier or individual_context_qualifier exists in biolink-model.

Proposal Options

  1. Propose new slot to biolink-model (Recommended)

    • Add case_context_qualifier: Optional[str]
    • Description: "A context qualifier representing a case or individual in which a relationship expressed in an association was observed"
    • Would allow: case_context_qualifier: "phenopacket.store:PMID_xxx_yyy"
  2. Use generic qualifiers list (fallback)

    • Available on all associations
    • Less semantic but works today
    • qualifiers: ["case:phenopacket.store:PMID_xxx_yyy"]

Additional Qualifier Usage

  • interpretation_status (CAUSATIVE, CONTRIBUTING) → Use as statement_qualifier or dedicated property
  • zygosity → association property (exists on some variant associations)
  • onset_qualifier → already implemented for phenotypes/diseases

Data Available

From 8,207 phenopacket records:

  • 100% have variant data with interpretation status
  • 99% of variants have associated gene IDs (HGNC)
  • 100% have disease associations

Next Steps

  1. Decide on approach for case_context_qualifier (propose to biolink-model or use fallback)
  2. Implement CaseToVariantAssociation transformer
  3. Implement VariantToGeneAssociation transformer
  4. Implement VariantToDiseaseAssociation transformer
  5. Update transform.yaml with new edge_properties

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions