Skip to content

Classa: Uncovering Class Pollution in Python -- Measuring Class Pollution Vulnerabilities of 3000 Real-World Python Projects

License

Notifications You must be signed in to change notification settings

KTH-LangSec/classa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

209 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diogo's Master Thesis: Classa

  • Title: Classa: Uncovering Class Pollution in Python
  • Subtitle: Measuring Class Pollution Vulnerabilities of 3000 Real-World Python Projects
  • Abstract:

    Over the past few decades, code reuse attacks have shown how malicious actors can alter a program's intended execution flow by taking advantage of benign code already present in the application. Class Pollution in the Python programming language is a novel variant of a code reuse attack, which can enable a malicious party to surgically mutate a variable in any part of the application in order to trigger a change in its execution flow.

    However, until now, little to no research has explored class pollution in detail, and no tool is readily-available to detect it. For this reason, as part of this degree project, a literature review on the causes and consequences of class pollution has been conducted, in addition to the methodical development of a tool capable of detecting class pollution, Classa.

    Additionally, an empirical study on the prevalence of class pollution in real-world Python code has been performed by running Classa against a dataset of 3000 Python projects, revealing, most notably, a critical vulnerability in a popular PyPI package with more than 30 million downloads. This vulnerability allowed for Denial of Service and Remote Code Execution, having since been responsibly disclosed and patched.

    Altogether, the results revealed that while not many real-world Python projects are susceptible to class pollution, it is a vulnerability that must be accounted for when building a secure application due to the serious consequences it can lead to.

Building

Building this project and its associated tool (Classa) requires Nix to be installed. You can download Nix at the NixOS website.

Once Nix is installed, run the following command in this project's root directory to enter a development shell with all required dependencies:

nix develop -f shell.nix

You can expect this command to take longer the first time it is run, given Nix might compile Pysa and other required dependencies. In subsequent runs, entering the development shell should be significantly faster.

Once inside the development shell, you can compile Classa using cargo:

cd tool
cargo build --release

This will create a binary at ./target/release/classa. If you prefer, you can also run it using cargo, which will (re)build the binary when needed:

cargo run --release

Usage

You can get an overview of Classa's functionality by running classa --help:

$ ./target/release/classa --help
Find class pollution in Python programs

Usage: classa [OPTIONS] <COMMAND>

Commands:
  analyse  Run analysis on a Python program and try to find class pollution
  e2e      Run an end-to-end pipeline, analysing the projects declared in the given dataset
  results  Parse results from a pysa run and summarise them
  label    Parse reports from a previous e2e run, show issues, and ask for appropriate labels
  summary  Parse reports from a previous e2e run, and compile it into a JSON file that be used for charts
  help     Print this message or the help of the given subcommand(s)

Options:
      --pyre-path <PYRE_PATH>  Path to the pyre (Python) program. If not provided, tries to find it in PATH [env: PYRE_PATH=]
      --workdir <WORKDIR>      A path to the work directory to use, for storing files during the analysis and also final reports, when applicable [env: WORKDIR=]
      --keep-workdir           Whether to keep the work directory after exiting, instead of deleting it. This is implicitly true if the --workdir option is given [env: KEEP_WORKDIR=]
  -h, --help                   Print help
  -V, --version                Print version

You can get further help about each command by running classa <COMMAND> --help.

Below are some common commands you might want to run. For simplicity, it is assumed the binary is named classa, but you might want to use ./target/release/classa or cargo run --release -- instead.

You will certainly also want to set the WORKDIR environment variable, or pass the --workdir argument, otherwise the detailed results will be immediately deleted.

  • Analyse a single Python program that is already present in the file system:

    classa analyse /path/to/project
  • Analyse projects in bulk from a dataset
    Note: To generate a dataset, see the data directory of this repository.

    classa e2e /path/to/dataset.toml

    After the analysis is completed, detailed information about the results of each project can be found in their individual reports in the <workdir>/reports directory.

  • Label results present in an existing workdir
    Note: Ensure the workdir is the same generated by a e2e run

    classa label
  • Generate summary of an e2e analysis
    Note: Ensure the workdir is the same generated by a e2e run

    classa summary

    After running this command, a summary.json file inside the workdir will contain an overview of all analysed projects along with the issues found.

Documents

All deliverables of this project, including compiled Typst documents, can be found in the releases tab of this repository. Each file is accompanied with a GPG signature (.asc files), which can be verified using my public key at gpg.diogotc.com.

Verification Instructions
  1. Download my public key and import it into the GPG keystore:

    curl https://gpg.diogotc.com | gpg --import
  2. (Optional) Validate that the fingerprint matches 111F 91B7 5F61 99D8 985B 4C70 12CF 31FD FF17 2B77. You can view the fingerprint of all keys in your keyring using gpg -k.

  3. Download file and signature from the releases tab.

  4. Validate the signature using gpg --verify <filename>.asc.

License

All the code is licensed under the GPL-3.0-or-later license, while the documents in the docs directory are licensed under CC-BY-SA-4.0.

About

Classa: Uncovering Class Pollution in Python -- Measuring Class Pollution Vulnerabilities of 3000 Real-World Python Projects

Resources

License

Stars

Watchers

Forks

Packages

No packages published