Skip to content

simeonhebrew/MicroFactual

Repository files navigation

MicroFactual

Python 3.10+ License: MIT CI

A Python framework for interpretable microbiome machine learning with sklearn-compatible APIs.

Features

  • 🧬 Microbiome-optimized preprocessing β€” Abundance filtering, prevalence filtering, CLR transformation
  • πŸ“Š Rich Visualization β€” ROC curves, Confusion Matrices, Feature Importance plots
  • 🧠 Explainable AI β€” Counterfactual explanations via DiCE integration
  • πŸ€– sklearn-compatible β€” Works with cross_val_score, Pipeline, GridSearchCV
  • πŸ“ˆ One-liner API β€” Run complete workflows in a single function call
  • πŸ”¬ Built for researchers β€” Sensible defaults, minimal boilerplate

Architecture

graph TB
    subgraph "User-Facing Layer"
        API["High-Level API<br/>mf.classify(), mf.explain()"]
    end

    subgraph "Core Abstractions"
        Dataset["MicrobiomeDataset<br/>β€’ X, y properties"]
        Pipeline["Preprocessing<br/>sklearn Pipeline"]
        Models["Models<br/>β€’ MicrobiomeClassifier"]
    end

    subgraph "Interpretation Features"
        Viz["Visualization<br/>β€’ Plots & ROC"]
        Explain["Explainability<br/>β€’ Counterfactuals (DiCE)"]
    end

    API --> Dataset
    Dataset --> Pipeline
    Pipeline --> Models
    Models --> Viz
    Models --> Explain

    style API fill:#e3f2fd
    style Viz fill:#e8f5e9
    style Explain fill:#fff3e0
Loading

Installation

# Using uv (recommended)
uv pip install -e .

# Or using pip
pip install -e .

Requires Python 3.10+

Quick Start

One-Line Classification

import microfactual as mf

results = mf.classify(
    "data/abundance.tsv",
    "data/metadata.tsv",
    target_column="disease"
)

print(f"CV Accuracy: {results['cv_scores']['test_accuracy']:.3f}")

sklearn-Compatible API

from microfactual import MicrobiomeClassifier, MicrobiomeDataset
from sklearn.model_selection import cross_val_score

# Load data
dataset = MicrobiomeDataset.from_files(
    "data/abundance.tsv",
    "data/metadata.tsv",
    target_column="disease"
)

# Train classifier
clf = MicrobiomeClassifier(algorithm="random_forest")
scores = cross_val_score(clf, dataset.X, dataset.y, cv=5)

Custom Preprocessing

from microfactual import (
    MicrobiomeClassifier,
    AbundanceFilter,
    PrevalenceFilter,
    CLRTransform
)

clf = MicrobiomeClassifier(
    algorithm="logistic",
    preprocessing=[
        AbundanceFilter(min_abundance=0.01),
        PrevalenceFilter(min_prevalence=0.1),
        CLRTransform()
    ]
)
clf.fit(X, y)

CLI Usage

microfactual \
    --abundance data/abundance.tsv \
    --metadata data/metadata.tsv \
    --target disease \
    --output_dir results/

API Reference

High-Level

Function Description
mf.classify() One-liner classification pipeline

Core Classes

Class Description
MicrobiomeDataset Data container with X, y properties
MicrobiomeClassifier Classifier with built-in preprocessing

Preprocessing Transforms

All transforms are sklearn-compatible (fit/transform):

Transform Description
AbundanceFilter Remove low-abundance features
PrevalenceFilter Remove rare features
CLRTransform Centered log-ratio transformation

Visualization

Function Description
mf.plot_roc() Plot ROC curve with AUC score
mf.plot_confusion_matrix() Plot confusion matrix with labels
mf.plot_feature_importance() Plot top feature importances
mf.launch_dashboard() Launch interactive ExplainerDashboard

Explainability

Class/Function Description
DiCEExplainer Generate counterfactual explanations
BaseExplainer Abstract base class for custom explainers

Development

# Install dev dependencies
uv pip install -e ".[dev]"

# Run tests
make test

# Run linting
ruff check src/

Roadmap

  • XGBoost, SVM support
  • BIOM file format
  • XGBoost, SVM support
  • BIOM file format
  • SHAP integration

License

MIT License - see LICENSE for details.

Citation

If you use MicroFactual in your research, please cite:

@software{microfactual,
  title = {MicroFactual: Interpretable Microbiome ML},
  author = {Hebrew, Simeon and Adu-Gyamfi, Lawrence},
  year = {2025},
  url = {https://github.com/simeonhebrew/ML_Microbiome_Package}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •