Dfp pclr yoda by kilonzi · Pull Request #596 · broadinstitute/ml4h

kilonzi · 2025-04-30T19:37:55Z

This pull request introduces a deployment pipeline for the PCLR model, including schema definitions, Docker containerization, and preprocessing/postprocessing scripts. The changes focus on defining input/output formats, creating a Dockerized environment for processing, and implementing scripts for preparing and finalizing data.

Deployment Pipeline Setup:

Model Schema Definition:
Added a JSON schema in pclr_model_schema.json to specify the model's input (ecg_tensor) and output (embed) formats, including their shapes and data types.
Dockerfile for Processing:
Created a Dockerfile to set up a lightweight Python 3.9-based environment for running the preprocessing (prepare.py) and postprocessing (finalize.py) scripts. It installs dependencies from requirements.txt and sets the entry point to Python.

Data Processing Scripts:

Preprocessing Script (prepare.py):
Added a script to process raw ECG files into HDF5 tensor format. It reads input CSVs, extracts ECG data from files, interpolates and normalizes the data, and saves it in a structured HDF5 format.
Postprocessing Script (finalize.py):
Added a script to merge model predictions with input metadata. It reads a CSV of metadata and a JSON of predictions, validates dimensions, and outputs a combined CSV with embeddings appended.
Dependencies:
Added a requirements.txt file listing the necessary Python libraries (pandas, numpy, h5py, smart-open[gcs]) for preprocessing and postprocessing.

…models

model_zoo/PCLR/deployment/v1/pclr_model_schema.json

model_zoo/PCLR/deployment/v1/processing_image/finalize.py

Co-authored-by: John Kitonyo <johnkilonzi@outlook.com>

daniellepace added 11 commits April 24, 2025 16:19

ENH: Add C3PO PCLR models

03862c1

ENH: edit get_model() and get_representations() to add the c3po pclr …

49c0869

…models

STYLE: Update README

c69bb99

FIX: Upload the correct c3po_pclr model

cf0be8f

COMP: Add deployment and version folders

74b23d6

COMP: Add docker file

20e0998

COMP: Add requirements.txt

c49d7e4

ENH: Add prepare script

1897e7c

ENH: add PCLR model schema

b4e3540

ENH: Add finalize script

8929d2b

FIX: fix warnings in finalize script

e84687e

kilonzi commented Apr 30, 2025

View reviewed changes

model_zoo/PCLR/deployment/v1/pclr_model_schema.json Outdated Show resolved Hide resolved

kilonzi commented May 1, 2025

View reviewed changes

model_zoo/PCLR/deployment/v1/pclr_model_schema.json Outdated Show resolved Hide resolved

model_zoo/PCLR/deployment/v1/processing_image/finalize.py Outdated Show resolved Hide resolved

daniellepace and others added 6 commits May 7, 2025 11:52

Update model_zoo/PCLR/deployment/v1/pclr_model_schema.json

7f9e23f

Co-authored-by: John Kitonyo <johnkilonzi@outlook.com>

Update model_zoo/PCLR/deployment/v1/pclr_model_schema.json

dd8af30

Co-authored-by: John Kitonyo <johnkilonzi@outlook.com>

Update model_zoo/PCLR/deployment/v1/processing_image/finalize.py

d1513d0

Co-authored-by: John Kitonyo <johnkilonzi@outlook.com>

COMP: Setup for C3PO-PCLR model to yoda

9dc2a8f

Merge branch 'dfp_pclr_zoo' into dfp_pclr_yoda

0fca9ca

ENH: Add files for C3PO-PCLR in YODA

d55ad43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dfp pclr yoda#596

Dfp pclr yoda#596
kilonzi wants to merge 17 commits intomasterfrom
dfp_pclr_yoda

kilonzi commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kilonzi commented Apr 30, 2025

Deployment Pipeline Setup:

Data Processing Scripts:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants