Skip to content

HL7 FHIR FAIR Data Point (FDP) adapter that is developed within the scope of the STAGE Project (https://stage-healthyageing.eu/)

Notifications You must be signed in to change notification settings

srdc/stage-fhir-fdp-adapter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Spark-FHIR Healthy Aging Dataset Extractor

This project leverages Apache Spark and the spark-on-fhir toolkit to flatten complex clinical data (Observations, QuestionnaireResponses) into analytics-ready CSVs. Simultaneously, it generates rich metadata (DCAT, CSVW, SKOS) and publishes it to a FAIR Data Point (FDP).


Project Overview

The pipeline performs the following key operations:

  1. Extracts raw FHIR resources (Patient, Observation, QuestionnaireResponse, Questionnaire).
  2. Resolves terminology by joining patient answers with full Questionnaire definitions.
  3. Transforms data into a wide-format patient profile with one row per patient.
  4. Generates FAIR metadata:
  • DCAT: Catalog, Dataset, and Distribution descriptions.
  • CSVW: Schema definitions for the output data.
  • SKOS: Concept schemes mapping survey codes to human-readable displays.
  1. **Publishes metadata directly to a FAIR Data Point (FDP) and/or saves locally as Turtle (.ttl) files.

Prerequisites

Before running the application, ensure you have the following environment set up:

  • Java 11+
  • Apache Spark 3.5.x
  • FHIR Server: A running R4 FHIR Server (e.g., OnFhir) containing your source data.
  • FDP Server (Optional): A running FAIR Data Point instance if you intend to publish metadata remotely.
  • Dependencies:
  • spark-on-fhir-sdk (Ensure this is available in your Maven repo)
  • onfhir-feast (Used for metadata component extraction)

Building the Project

Clone the repository and build the "fat JAR" using Maven. This will bundle all necessary Scala dependencies.

cd stage-fhir-fdp-adapter
mvn -DskipTests clean package

Configuration

You can configure the pipeline using either a JSON file or an Excel spreadsheet. This file controls the inputs (FHIR URL), outputs (DCAT metadata), and behavior.

1. Configure Data Source & Metadata

Edit config.json (or config.xlsx) to define your target FHIR server and metadata properties.

Parameter Description
fhirUrl Base URL of the source FHIR server.
fdpUrl (Optional) URL of the target FAIR Data Point.
catalogTitle Title of the Data Catalog to be created.
datasetTitle Title of the specific Dataset.
outputDir Local directory to save CSV and TTL files.

2. Configure Run Mode

The application supports different ways to load these configurations:

  • browser (Default): Loads a web form to fill in the config data in runtime..
  • json: Automatically loads a standard config.json from the classpath/working dir.
  • excel: Automatically loads a standard config.xlsx from the classpath/working dir.

Usage

Use the provided shell script to submit the Spark job. You can customize the job type, output format, and configuration mode via CLI arguments.

./run-cli.sh --job survey --format csv --runMode browser

CLI Arguments

Argument Default Description
--job survey The ETL pipeline to run. Currently supports survey (Healthy Aging). Can be extended for other cohorts.
--format csv The output format for the patient data (csv or parquet).
--runMode browser How the app loads configuration (json, excel, or browser).

Output

After a successful run, the outputDir will contain:

  1. patient_profiles/: A folder containing the extracted data in CSV format (one row per patient, columns for every Observation and Survey Question).
  2. Catalog.ttl: RDF description of the Data Catalog.
  3. Dataset.ttl: RDF description of the Dataset, linked to the Catalog.
  4. Distribution.ttl: RDF description of the CSV file, linked to the Dataset.
  5. CSVW.ttl: W3C CSV-on-the-Web schema describing columns and data types.
  6. Vocabularies.ttl: SKOS concepts defining the questions and answer options found in the survey.

About

HL7 FHIR FAIR Data Point (FDP) adapter that is developed within the scope of the STAGE Project (https://stage-healthyageing.eu/)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published