Create conda environment:
mamba env create -f environment.yml
mamba activate amino-acid-shift
pip install ete3 pytest PyYAML click legacy-cgi
pip install --no-deps pymutspec # temporary solution until pymutspec is on conda-forge- SARS-CoV-2 amino acid substitutions ananlysis
- RNA viruses mutational spectra derivation and analysis
| Notebook | Short description | Outline |
|---|---|---|
| 1sars-cov-2/1Neutral_scenario.ipynb | Neutral-model analyses: compute equilibrium freqs and compare models. | |
| 1sars-cov-2/2.1observed_sbs_fit.ipynb | Fit observed substitution patterns, compute fit metrics and plots. | |
| 1sars-cov-2/2.2fitness_analysis.ipynb | Analyze Δfitness of substitutions across clades using Bloom et al. data. | |
| 1sars-cov-2/2.3changes_during_COVID-19.ipynb | Analyze genotype data and changes during COVID-19 (genotypes2025). | |
| 1sars-cov-2/absorb_r2.ipynb | Analyze absorption metrics and R² relationships for model fits. | |
| 1sars-cov-2/model.ipynb | Model linking nucleotide mutation rates to expected AA substitutions. | |
| 1sars-cov-2/verify_model.ipynb | Verify equilibrium calculations (codon/AA level) and test stop-codon handling. | |
| 2other_viruses/1.1get_viruses_linages.ipynb | Extract viral taxonomy/lineages and select species for analyses. | |
| 2other_viruses/1get_viral_spectra_dataset.ipynb | Assemble viral spectra dataset (NEMU + Bloom) and tidy data. | |
| 2other_viruses/2spectra_eda_plus_eq_freq_be.ipynb | EDA of mutation spectra and equilibrium-frequency computations. | |
| 2other_viruses/2get_plot_ms12grouped_all_viruses.ipynb | Plot ms12 mutational spectra grouped by virus type/family. | |
| 2other_viruses/2get_cossim_between_nemu_Bloom.ipynb | Compute cosine similarity between NEMU dataset and Bloom et al. spectra. | |
| 2other_viruses/3pca.ipynb | PCA and classification of viral spectra; feature importance and clustering. | |
| 2other_viruses/3.2model_fit_quality.ipynb | Assess neutral-model fit quality and mutation summaries across viruses. | |
| 2other_viruses/4compare_aa_freqs_diff.ipynb | Compare AA frequency differences between viral groups and plot results. | |
| 2other_viruses/prepare_table_of_vir_info.ipynb | Build table of virus metadata and compute summary statistics for spectra. | |
| 2other_viruses/get_viruses_nucl_freq_aa_freq.ipynb | Compute amino-acid and nucleotide frequencies from viral reference sequences. | |
| 2other_viruses/get_viruses_nuc_and_aa_equilibrium_freq.ipynb | Compute nucleotide and AA equilibrium frequencies across viruses. | |
| 2other_viruses/get_plot_aa_observed_freq_vs_eq_freq.ipynb | Plot observed vs equilibrium amino-acid frequencies (per virus/group). | |
| extra/human_germline.ipynb | Analyze human germline mutation spectra and per-gene AA equilibrium. | |
| extra/nuc_eq_example.ipynb | Examples and calculations of nucleotide-equilibrium probabilities. | |
| extra/random_spectra_thrend.ipynb | Random-spectrum simulations and their effect on equilibrium AA frequencies. | |
| extra/rate_vs_freq.ipynb | Investigate relationships between mutation rates and observed frequencies. |
extra/nuc_eq_example.ipynb: "#### Cov20A (highly asymmetric)" extra/nuc_eq_example.ipynb: "#### aka spectrum (more symmetric)" extra/random_spectra_thrend.ipynb: "## Load rdrp freqs" extra/random_spectra_thrend.ipynb: "## Random spectra consequenses" extra/random_spectra_thrend.ipynb: "### Tests" extra/random_spectra_thrend.ipynb: "### Precalculated" extra/random_spectra_thrend.ipynb: "### CCA" extra/random_spectra_thrend.ipynb: "### PCA of amino acid freqs" extra/random_spectra_thrend.ipynb: "## Load RdRp amino acid freqs" -->
data of Jesse D Bloom, Annabel C Beichman, Richard A Neher, Kelley Harris, Evolution of the SARS-CoV-2 Mutational Spectrum, Molecular Biology and Evolution, Volume 40, Issue 4, April 2023, msad085, https://doi.org/10.1093/molbev/msad085
-
image of spectra https://jbloomlab.github.io/SARS2-mut-spectrum/rates-by-clade.html
-
spectra table https://github.com/jbloomlab/SARS2-mut-spectrum/blob/main/results/synonymous_mut_rates/rates_by_clade.csv
-
other viruses spectra https://github.com/jbloomlab/SARS2-mut-spectrum/blob/main/results/other_virus_spectra/other_virus_spectra.json
-
Serratus database
- NAR paper about sars-cov-2 genome secondary structure
- sars-cov-2 replication paper 1 and 2
- sars-cov-2 life circle paper
- statistics decision tree pingouin
- sars-cov-2 variants
- Serratus database
- /refseq/release/viral
- NCBI Virus db