-
Notifications
You must be signed in to change notification settings - Fork 2
Description
What new standard is being proposed?
Allow the use of a python virtual environment (venv) within production packages in the directory venv/.
Why is the new standard necessary?
The current limitations of venvs on WCOSS2 are as follows:
- Developers must request a module installation for a venv, which can (a) slow down development and (b) make changes during SPA review more difficult because the venv cannot be quickly modified or upgraded
- These venvs are installed in a shared space despite only being used by a single package
Looking forward, models which use artificial intelligence and machine learning algorithms will need to keep pace with rapid advancements in algorithm development. Allowing venvs to be installed at the package level will enable developers to quickly adapt to upstream changes and apply them in their package.
How will the new standard be enforced?
The following standards should be considered for package-level venvs:
- A
requirements.txtfile must be included with the package, which must contain all venv dependencies - All python libraries in
requirements.txtmust be on the approved software list- Dependencies will be checked against approved software list for new packages
- New dependencies for upgraded packages will be approved individually
- A build script must be included and must do the following
- Replace the value of
VIRTUAL_ENVinvenv/bin/activatewith${HOME<model>}/venv/bin/activate - Load the newest system python version during the build process
- Load any other libraries required to build the venv
- Have options for building, cleaning, and debugging
- Use
--always-copywhen creating a venv so that no links are created - Use
--require-virtualevwhen installing packages withpip - Use
--no-cache-dirso that each build is reproducible
- Replace the value of
- All jobs which use the venv must use
source ${HOME<model>}/venv/bin/activate
Example build script with the proposed standards.
Impact of this standard?
Who is affected?
Considerable effort will be required from developers to adhere to the standards proposed above.
Considerable effort will be required from SPA team members to check venv standards, especially for new packages with large requirements.txt files. We may want to develop automated tools to check standards.
The packages that will need modification according to this standard include
- aigfs
- aigefs
Other packages which have venvs in shared spaces that may adopt this standard include
- cpci
- evs
- hysplit
- lmp
- mag
- nbm
- nosofs
- petss
- spcpost
- stofs
- hafs
- nhcg
- petss
Downsides or Trade-offs
Please consider and discuss security issues that I may have missed.
I do not see a clear way to keep track of whether or not the venv has been loaded. The current approach is that a module is created in a shared space, then the desired module is loaded. Is it possible to use a similar approach for a package-level venv?
Similarly, running module reset does not remove VIRTUAL_ENV from the environment. This means that other packages could accidentally use a venv loaded by a previous job. We could require that all jobs which use a venv must run deactivate as a cleanup step.
All production code is publicly available here. Does this mean that venv/ will be indexed publicly as well? Should we prevent venv/ being publicly displayed?