Skip to content

[New Standard]: Package-level venv #18

@RussellManser-NCO

Description

@RussellManser-NCO

What new standard is being proposed?

Allow the use of a python virtual environment (venv) within production packages in the directory venv/.

Why is the new standard necessary?

The current limitations of venvs on WCOSS2 are as follows:

  • Developers must request a module installation for a venv, which can (a) slow down development and (b) make changes during SPA review more difficult because the venv cannot be quickly modified or upgraded
  • These venvs are installed in a shared space despite only being used by a single package

Looking forward, models which use artificial intelligence and machine learning algorithms will need to keep pace with rapid advancements in algorithm development. Allowing venvs to be installed at the package level will enable developers to quickly adapt to upstream changes and apply them in their package.

How will the new standard be enforced?

The following standards should be considered for package-level venvs:

  • A requirements.txt file must be included with the package, which must contain all venv dependencies
  • All python libraries in requirements.txt must be on the approved software list
    • Dependencies will be checked against approved software list for new packages
    • New dependencies for upgraded packages will be approved individually
  • A build script must be included and must do the following
    • Replace the value of VIRTUAL_ENV in venv/bin/activate with ${HOME<model>}/venv/bin/activate
    • Load the newest system python version during the build process
    • Load any other libraries required to build the venv
    • Have options for building, cleaning, and debugging
    • Use --always-copy when creating a venv so that no links are created
    • Use --require-virtualev when installing packages with pip
    • Use --no-cache-dir so that each build is reproducible
  • All jobs which use the venv must use source ${HOME<model>}/venv/bin/activate

Example build script with the proposed standards.

Impact of this standard?

Who is affected?
Considerable effort will be required from developers to adhere to the standards proposed above.

Considerable effort will be required from SPA team members to check venv standards, especially for new packages with large requirements.txt files. We may want to develop automated tools to check standards.

The packages that will need modification according to this standard include

  • aigfs
  • aigefs

Other packages which have venvs in shared spaces that may adopt this standard include

  • cpci
  • evs
  • hysplit
  • lmp
  • mag
  • nbm
  • nosofs
  • petss
  • spcpost
  • stofs
  • hafs
  • nhcg
  • petss

Downsides or Trade-offs
Please consider and discuss security issues that I may have missed.

I do not see a clear way to keep track of whether or not the venv has been loaded. The current approach is that a module is created in a shared space, then the desired module is loaded. Is it possible to use a similar approach for a package-level venv?

Similarly, running module reset does not remove VIRTUAL_ENV from the environment. This means that other packages could accidentally use a venv loaded by a previous job. We could require that all jobs which use a venv must run deactivate as a cleanup step.

All production code is publicly available here. Does this mean that venv/ will be indexed publicly as well? Should we prevent venv/ being publicly displayed?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions