A Python framework for interpretable microbiome machine learning with sklearn-compatible APIs.
- 𧬠Microbiome-optimized preprocessing β Abundance filtering, prevalence filtering, CLR transformation
- π Rich Visualization β ROC curves, Confusion Matrices, Feature Importance plots
- π§ Explainable AI β Counterfactual explanations via DiCE integration
- π€ sklearn-compatible β Works with
cross_val_score,Pipeline,GridSearchCV - π One-liner API β Run complete workflows in a single function call
- π¬ Built for researchers β Sensible defaults, minimal boilerplate
graph TB
subgraph "User-Facing Layer"
API["High-Level API<br/>mf.classify(), mf.explain()"]
end
subgraph "Core Abstractions"
Dataset["MicrobiomeDataset<br/>β’ X, y properties"]
Pipeline["Preprocessing<br/>sklearn Pipeline"]
Models["Models<br/>β’ MicrobiomeClassifier"]
end
subgraph "Interpretation Features"
Viz["Visualization<br/>β’ Plots & ROC"]
Explain["Explainability<br/>β’ Counterfactuals (DiCE)"]
end
API --> Dataset
Dataset --> Pipeline
Pipeline --> Models
Models --> Viz
Models --> Explain
style API fill:#e3f2fd
style Viz fill:#e8f5e9
style Explain fill:#fff3e0
# Using uv (recommended)
uv pip install -e .
# Or using pip
pip install -e .Requires Python 3.10+
import microfactual as mf
results = mf.classify(
"data/abundance.tsv",
"data/metadata.tsv",
target_column="disease"
)
print(f"CV Accuracy: {results['cv_scores']['test_accuracy']:.3f}")from microfactual import MicrobiomeClassifier, MicrobiomeDataset
from sklearn.model_selection import cross_val_score
# Load data
dataset = MicrobiomeDataset.from_files(
"data/abundance.tsv",
"data/metadata.tsv",
target_column="disease"
)
# Train classifier
clf = MicrobiomeClassifier(algorithm="random_forest")
scores = cross_val_score(clf, dataset.X, dataset.y, cv=5)from microfactual import (
MicrobiomeClassifier,
AbundanceFilter,
PrevalenceFilter,
CLRTransform
)
clf = MicrobiomeClassifier(
algorithm="logistic",
preprocessing=[
AbundanceFilter(min_abundance=0.01),
PrevalenceFilter(min_prevalence=0.1),
CLRTransform()
]
)
clf.fit(X, y)microfactual \
--abundance data/abundance.tsv \
--metadata data/metadata.tsv \
--target disease \
--output_dir results/| Function | Description |
|---|---|
mf.classify() |
One-liner classification pipeline |
| Class | Description |
|---|---|
MicrobiomeDataset |
Data container with X, y properties |
MicrobiomeClassifier |
Classifier with built-in preprocessing |
All transforms are sklearn-compatible (fit/transform):
| Transform | Description |
|---|---|
AbundanceFilter |
Remove low-abundance features |
PrevalenceFilter |
Remove rare features |
CLRTransform |
Centered log-ratio transformation |
| Function | Description |
|---|---|
mf.plot_roc() |
Plot ROC curve with AUC score |
mf.plot_confusion_matrix() |
Plot confusion matrix with labels |
mf.plot_feature_importance() |
Plot top feature importances |
mf.launch_dashboard() |
Launch interactive ExplainerDashboard |
| Class/Function | Description |
|---|---|
DiCEExplainer |
Generate counterfactual explanations |
BaseExplainer |
Abstract base class for custom explainers |
# Install dev dependencies
uv pip install -e ".[dev]"
# Run tests
make test
# Run linting
ruff check src/- XGBoost, SVM support
- BIOM file format
- XGBoost, SVM support
- BIOM file format
- SHAP integration
MIT License - see LICENSE for details.
If you use MicroFactual in your research, please cite:
@software{microfactual,
title = {MicroFactual: Interpretable Microbiome ML},
author = {Hebrew, Simeon and Adu-Gyamfi, Lawrence},
year = {2025},
url = {https://github.com/simeonhebrew/ML_Microbiome_Package}
}