Skip to content

dieterich-lab/CardioGuidelinesGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

323 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CardioGuidelinesGraph

Comprehensive Knowledge Graph Construction and Reasoning for Cardiovascular Guidelines
Integrating SNOMED CT, logic, and LLMs for patient-centric clinical decision support

🚀 Project Vision

CardioGuidelinesGraph is a research-driven framework to transform cardiovascular guidelines into a computable, queryable, and explainable knowledge graph. It enables:

  • Semantic interoperability via SNOMED CT integration
  • Logic-aware reasoning for complex clinical recommendations
  • Patient-specific question answering and evidence tracing
  • Rapid extension to new guidelines, domains, and research questions
Why is this important?

Clinical guidelines are the backbone of evidence-based medicine, but their logic is often buried in prose and tables. CardioGuidelinesGraph makes this knowledge explicit, computable, and accessible for both humans and machines.



1. High-Level Pipeline

flowchart TD
  A[Guideline Documents PDF or Markdown] --> B[Parsing and Chunking]
  B --> C[Statement Extraction and Logic Mapping]
  C --> D[Entity Grounding NER and SNOMED CT]
  D --> E[Ontology Construction OWL or RDF]
  E --> F[Knowledge Graph Construction Neo4j]
  F --> G[Querying and Reasoning]
  G --> H[Patient Specific Answers and Evidence]
Loading


2. Architecture & Main Components

Ontology & Entity Grounding

Knowledge Extraction & Graph Construction

  • Markdown/PDF Parsing (parsing_utils/): Extracts structured statements and tables from guideline documents.
  • Statement Extraction & Embedding (extraction_utils/): Converts parsed text into logical statements, embeds them, and prepares them for graph construction.
  • Graph Construction (extraction_utils/new_graph_construction.py): Builds the Neo4j knowledge graph, representing statements, entities, and logical junctions.

Querying & Reasoning

  • Query Interpreter (extraction_utils/query_interpreter.py): Accepts natural language or structured queries, extracts relevant subgraphs, and resolves logical junctions to answer clinical questions.
  • Logic Handling (extraction_utils/query_copy.py): Implements logic for traversing AND/OR/NOT nodes and extracting relevant evidence paths.

Integration & RAG

  • RAG Utilities (rag_utils/): Supports retrieval-augmented generation and embedding-based search over the KG.
  • Neo4j Utilities (neo4j_utils/): Handles Cypher generation, database feeding, and graph utilities.

3. Detailed Pipeline

flowchart TD
  A1[Load ontology_config.yaml] --> A2[Connect to SNOMED CT DB]
  A2 --> A3[Extract concepts using search terms]
  A3 --> A4[LLM based categorization]
  A4 --> A5[Build OWL or RDF ontology]
  A5 --> B3
  B1[Parse guidelines PDF or Markdown] --> B2[Extract statements and tables]
  B2 --> B3[Ground entities to ontology]
  B3 --> B4[Map logic AND OR NOT]
  B4 --> B5[Build Neo4j graph]
  B5 --> C2
  C1[User or system query] --> C2[Subgraph extraction]
  C2 --> C3[Logic resolution]
  C3 --> C4[Answer and evidence]
Loading


4. Example Use Cases & Research Scenarios

  • Patient-Specific Recommendations: “Should a patient with HFrEF and diabetes receive a beta blocker?”
  • Guideline Comparison: “What are the differences in antiplatelet therapy recommendations between ESC and ACC/AHA guidelines?”
  • Evidence Tracing: “Show all evidence supporting CABG in patients with left main disease.”
  • Logic Pathways: “What logical conditions must be met for PCI to be recommended in NSTEMI?”
  • Ontology Auditing: “Which SNOMED CT concepts are not mapped to any core class?”

5. File & Module Structure

  • src/cardio_graph/snomedct_utils/: Ontology generation, SNOMED CT integration.
  • src/cardio_graph/extraction_utils/: Entity grounding, statement extraction, graph construction, querying.
  • src/cardio_graph/parsing_utils/: Markdown/PDF parsing.
  • src/cardio_graph/neo4j_utils/: Neo4j database utilities.
  • src/cardio_graph/rag_utils/: Retrieval-augmented generation and embedding search.

6. Getting Started

This project uses Poetry for dependency management. Before using any scripts, set up your environment:

# Install project with dependencies
poetry install

# Activate the virtual environment
poetry shell

# Download the spaCy model for Named Entity Recognition
poetry run python -m spacy download en_core_web_sm

# Download the scispaCy biomedical models for sentence splitting and entity grounding
poetry run pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_lg-0.5.4.tar.gz
poetry run pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz

7. Advanced Topics & Extensibility

  • Extending the Ontology: Add new classes or properties in ontology_config.yaml.
  • Custom Query Logic: Implement new logic in query_copy.py or query_interpreter.py.
  • Integration with LLMs: Use BAML and Ollama for advanced categorization and reasoning.

8. Glossary & References

Glossary

  • Ontology: A formal representation of knowledge as a set of concepts and relationships.
  • SNOMED CT: A comprehensive clinical terminology standard for health data.
  • Entity Grounding: Linking text mentions to canonical ontology concepts.
  • Logic Junctions: Logical operators (AND/OR/NOT) used to combine clinical statements.
  • RAG (Retrieval-Augmented Generation): Combining retrieval from a knowledge base with generative models for answering queries.
  • Neo4j: A graph database platform used for storing and querying the knowledge graph.


9. How to Contribute

We welcome contributions from the research and clinical informatics community! Please:

  • Open issues for bugs, feature requests, or questions
  • Submit pull requests for improvements or new modules
  • Add tests and documentation for new features

CardioGuidelinesGraph: Making Clinical Knowledge Computable

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •