Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6

- name: install micromamba
uses: mamba-org/setup-micromamba@v2
- name: Install miniforge
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-version: latest
environment-file: docs/environment.yml
environment-name: sphinx
activate-environment: sphinx

- name: Run sphinx
shell: bash -l {0}
Expand All @@ -38,7 +39,7 @@ jobs:

steps:
- name: Checkout code
uses: actions/checkout@v5
uses: actions/checkout@v6
with:
# super-linter needs the full git history to get the
# list of files that changed across commits
Expand Down Expand Up @@ -70,7 +71,7 @@ jobs:
contents: write

steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v6
Expand Down
Binary file added docs/_static/example_correlation_plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/mpralib.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"sphinx.ext.viewcode",
"sphinx.ext.todo",
"sphinx.ext.napoleon", # for Google/Numpy style docstrings
"myst_parser",
]

templates_path = ["_templates"]
Expand Down
14 changes: 14 additions & 0 deletions docs/doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,17 @@ Overview
=====================

MPRAlib is a library designed to analyze sequencing data from Massively Parallel Reporter Assays (MPRAs) from count tables for candidate sequences tested in the experiment.

Here is a schematic overview of MPRAlib:

.. image:: ../_static/mpralib.png

The main input consists of counts tables (primary data) containing DNA and RNA counts from MPRA experiments. These counts are assigned at either the oligo level or barcode level and are stored in an efficient data structure using `AnnData <https://anndata.readthedocs.io>`_. Count data from the MPRA pipeline `MPRAsnakeflow <https://doi.org/10.5281/zenodo.18163777>`_ can be directly used and is recommended for pre-processing the data.

With the MPRAlib data structure several options are possible. It can aggregate barcode level counts to oligo level counts, perform normalization and filtering of the data (barcode outlier detection, or sampling) without losing the main input. QC metrics like correlation across replicates or sample complexity can be computed and it provides different plot options to visualize the data.

Pairing the data with other metadata like a design table and quantification outputs from other tools like BCalm or mpralm the library can generate browsable genome tracks (BED files) to visualize the MPRA results.

MPRAlib can be used as library within your python code and some common used functionality is available as command line interface (CLI).

For more information on how to install and use MPRAlib, please refer to the :doc:`getting-started` guide. If you want to learn all command line options, please refer to the :doc:`cli`. Using the API we recommend to look at the :doc:`../tutorial/tutorial` and the :doc:`../mpralib`.
156 changes: 155 additions & 1 deletion docs/doc/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,158 @@ Getting Started
=====================


TODO
After :doc:`install` is complete we try to see if the installation was successful by running the command line interface (CLI) help command:


.. code-block:: console

mpralib --help

It should show the help message with all available commands and options, like this:

.. code-block:: text

Usage: mpralib [OPTIONS] COMMAND [ARGS]...

Command line interface of MPRAlib, a library for MPRA data analysis.

Options:
--help Show this message and exit.

Commands:
combine Combine counts with other outputs.
functional General functionality.
plot Plotting functions.
validate-file Validate standardized MPRA reporter formats.

If you see this message, the installation was successful. You can now start using MPRAlib either via the command line interface or as a library within your python code. We recommend to look at the :doc:`../tutorial/tutorial` and the :doc:`../mpralib` for using the API or the :doc:`cli` for using the command line interface. For a quickstart we provide one CLI and one API example below.


As a quick example we will read the example barcode count file and computing correlation across replicates and plot them. We will do this with the command line interface as well as through the python API.

Preparing Example Data
-----------------------

First we download an example barcode count file to work with using wget from our MPRAlib repository on GitHub:

.. code-block:: console

wget https://github.com/kircherlab/MPRAlib/raw/refs/tags/v0.9.0/resources/barcode_counts.tsv.gz -O example_barcode_counts.tsv.gz


Command Line Interface Example
-------------------------------

Now we can use the command line interface to compute correlation across replicates and plot them. We will use the ``functional compute-correlation`` command for this. The input is the barcode count file we just downloaded. We want to compute correlation for the activity (log2 normalized RNA over normalized DNA ratio) using ``--correlation-on activity``.

.. code-block:: console

mpralib functional compute-correlation \
--input example_barcode_counts.tsv.gz \
--correlation-on activity

This will compute spearman and pearson correlation across all 3 replicates. The result should look like this:

.. code-block:: text

pearson correlation on Modality.ACTIVITY: [0.967308 0.9596891 0.97339666]
spearman correlation on Modality.ACTIVITY: [0.9279497 0.92303765 0.94871825]

We can also set a minimum number of required barcodes per oligo to remove noisy oligos using ``--bc-threshold 10`` and rerun the command:

.. code-block:: console

mpralib functional compute-correlation \
--input example_barcode_counts.tsv.gz \
--correlation-on activity \
--bc-threshold 10


We should see a slight increase in the correlation values:

.. code-block:: text

pearson correlation on Modality.ACTIVITY: [0.97747856 0.9760033 0.98485214]
spearman correlation on Modality.ACTIVITY: [0.9380415 0.9349714 0.9591882]

To plot the correlation across replicates use the ``plot correlation`` command:

.. code-block:: console

mpralib plot correlation \
--input example_barcode_counts.tsv.gz \
--modality activity \
--output correlation_plot.png


The image ``example_correlation_plot.png`` should look similar like this:

.. image:: ../_static/example_correlation_plot.png


Python API Example
-------------------

We can do the same using the python API. Please start the python console, create a python file, or use a notebook. First we import the library and read in the barcode count file:

.. code-block:: python

import mpralib

# Read in barcode count file
mpra_barcode_data = mpralib.mpradata.MPRABarcodeData.from_file("example_barcode_counts.tsv.gz")


Now we compute the correlation of the oligo data. Because we have the data on a barcode level we first have to aggregate to get to the oligo level. This is simply generating an ``MPRAOligoLevelData`` object with the ``oligo_data`` getter. Then we can use the ``correlation`` method to compute correlation across replicates on the activity level.

.. code-block:: python

# Aggregate to oligo level
mpra_oligo_data = mpra_barcode_data.oligo_data

# Compute correlation on activity
print("🔗 Pairwise Pearson correlation (activity, log2 RNA/DNA ratio):")
activity_corr = mpra_oligo_data.correlation()
print(activity_corr)

The output should be:

.. code-block:: text

🔗 Pairwise Pearson correlation (activity, log2 RNA/DNA ratio):
[[1. 0.967308 0.9596891 ]
[0.967308 1. 0.97339666]
[0.9596891 0.97339666 1. ]]


We can also set a barcode threshold and recompute again:

.. code-block:: python

# Compute correlation on activity with barcode threshold
print("🔗 Pairwise Pearson correlation (activity, log2 RNA/DNA ratio) with barcode threshold 10:")
mpra_oligo_data.barcode_threshold = 10
activity_corr_bc_thresh = mpra_oligo_data.correlation()
print(activity_corr_bc_thresh)

The output should be:

.. code-block:: text

🔗 Pairwise Pearson correlation (activity, log2 RNA/DNA ratio) with barcode threshold 10:
[[1. 0.97747856 0.9760033 ]
[0.97747856 1. 0.98485214]
[0.9760033 0.98485214 1. ]]


Now let's plot it. To get the same plot as before we have to set the BC threshold back to none (or zero).

.. code-block:: python

# Plot pairwise correlation heatmap for oligo activities
from mpralib.utils.plot import correlation
import matplotlib.pyplot as plt

mpra_oligo_data.barcode_threshold = None
plt = correlation(mpra_oligo_data, mpralib.mpradata.Modality.ACTIVITY)
plt.show()
22 changes: 6 additions & 16 deletions docs/project/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,8 @@ Use the following steps for installing Sphinx and the dependencies for building
.. code-block:: bash

cd MPRAlib/docs
mamba env create -f environment.yml -n sphinx
mamba activate sphinx
conda env create -f environment.yml -n sphinx
conda activate sphinx

Use the following commands for building the documentation.
The first two lines are only required for loading the virtual environment.
Expand All @@ -128,8 +128,8 @@ Afterwards, you can always use ``make html`` for building.

cd MPRAlib/docs
conda activate sphinx
make html # rebuild for changed files only
make clean && make html # force rebuild
conda html # rebuild for changed files only
conda clean && make html # force rebuild

------------
Get Started!
Expand All @@ -149,23 +149,13 @@ First, create your development setup.

Now you can make your changes locally.

4. When you're done making your changes, make sure that Snakemake runs properly by using a dry-run.
For Snakemake::

snakemake --sdm conda --configfile config.yml -p -n

For documentation::

cd docs
make clean && make html

5. Commit your changes and push your branch to GitHub::
4. Commit your changes and push your branch to GitHub::

git add <your_new_file> # or git stage <your_edited_file>
git commit -m "Your detailed description of your changes."
git push origin name-of-your-bugfix-or-feature

6. Submit a pull request through the GitHub website.
5. Submit a pull request through the GitHub website.

-----------------------
Pull Request Guidelines
Expand Down
4 changes: 2 additions & 2 deletions docs/project/history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ History

The changelog for MPRAsnakeflow is included below. It provides a detailed history of changes, updates, and improvements made to the project.

.. literalinclude:: ../../CHANGELOG.md
:language: text
.. include:: ../../CHANGELOG.md
:parser: myst_parser.sphinx_
Loading