Add BulkRNABert model for cancer prognosis#713
Open
gwicho38 wants to merge 1 commit intosunlabuiuc:masterfrom
Open
Add BulkRNABert model for cancer prognosis#713gwicho38 wants to merge 1 commit intosunlabuiuc:masterfrom
gwicho38 wants to merge 1 commit intosunlabuiuc:masterfrom
Conversation
- Add BulkRNABertLayer encoder for gene expression data - Add BulkRNABert model for cancer type classification - Add BulkRNABertForSurvival for survival prediction with Cox loss - Add compute_c_index utility for survival evaluation - Add comprehensive unit tests Based on: Gélard et al. (2025) "BulkRNABert: Cancer prognosis from bulk RNA-seq based language models" Paper: https://www.biorxiv.org/content/10.1101/2024.06.13.598798 Model: https://huggingface.co/InstaDeepAI/BulkRNABert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds the BulkRNABert model for cancer prognosis tasks based on the paper by Gélard et al. (2025).
Key additions:
BulkRNABertLayer: Transformer encoder layer for gene expression dataBulkRNABert: Main model for cancer type classification (33 TCGA cancer types)BulkRNABertForSurvival: Model for survival prediction using Cox proportional hazards losscompute_c_index: Utility function for computing concordance indexPaper reference:
Gélard et al. (2025) "BulkRNABert: Cancer prognosis from bulk RNA-seq based language models"
Implementation Details
The implementation follows PyHealth's
BaseModelpattern and provides:Test Plan
Context
This contribution was developed as part of a CS 598 DLH (Deep Learning for Healthcare) course project reproducing the BulkRNABert paper results.