Skip to content

Comments

feat: Install system openmpi for usecases that don't need GPUs#19

Merged
spetrosi merged 16 commits intolinux-system-roles:mainfrom
spetrosi:install-system-openmpi
Oct 2, 2025
Merged

feat: Install system openmpi for usecases that don't need GPUs#19
spetrosi merged 16 commits intolinux-system-roles:mainfrom
spetrosi:install-system-openmpi

Conversation

@spetrosi
Copy link
Collaborator

@spetrosi spetrosi commented Sep 30, 2025

Enhancement: Install system openmpi for usecases that don't need GPUs

Reason: OpenMPI that the role builds with NVidia GPU support doesn't work on systems that do not have GPU. Simply do not fall back to CPU.

Result: The role installs both openmpi alongside, users can select the one to use with environment modules.

Authored with assistance from Cursor AI.

Summary by Sourcery

Introduce a dual-path OpenMPI installation supporting both OS packages and NVIDIA-enabled builds, add optional firewall trusted zone configuration, unify configuration templates, update file modes, and enhance documentation and tests.

New Features:

  • Allow installation of system-provided OpenMPI via hpc_install_system_openmpi flag
  • Add optional firewall management to set the default zone to trusted via hpc_manage_firewall flag

Enhancements:

  • Consolidate sysctl and sunrpc tuning into dedicated templates and adjust file permission modes
  • Rename and refactor tasks for building NVIDIA-enabled OpenMPI and common package installation

Documentation:

  • Update README with new variables, defaults, and lmod module load instructions

Tests:

  • Add test cases to verify ulimits and sysctl settings

* Install mpitest-opnmpi and lmod anyway
* Add documentation about modules
* Add variables hpc_install_system_openmpi and hpc_build_openmpi
* Use HPC-X wording for consisstency
* Use 0644 where executing perms is not required
* Apply sunrpc configuration via template
* Add tests for tuning
@spetrosi spetrosi requested a review from richm as a code owner September 30, 2025 09:19
@sourcery-ai
Copy link

sourcery-ai bot commented Sep 30, 2025

Reviewer's Guide

This PR enhances the HPC role by installing a system-provided OpenMPI alongside the GPU-enabled build, refactors build flags, adds firewall configuration support, standardizes file modes and handler ordering, and enriches testing for ulimits and sysctl tunings.

Class diagram for OpenMPI installation options

classDiagram
    class HPC_Role {
        +hpc_install_system_openmpi: bool
        +hpc_build_openmpi: bool
        +hpc_manage_firewall: bool
        +hpc_manage_storage: bool
        +hpc_tuning: bool
    }
    class OpenMPI {
        +system_openmpi
        +gpu_enabled_openmpi
    }
    HPC_Role --> OpenMPI: installs
    OpenMPI : system_openmpi from AppStream
    OpenMPI : gpu_enabled_openmpi built from source
Loading

File-Level Changes

Change Details Files
Introduce system OpenMPI installation alongside source build
  • Add hpc_install_system_openmpi variable with default true
  • Implement tasks to install common and system OpenMPI packages
  • Update __hpc_system_openmpi_packages and __hpc_openmpi_common_packages in vars
tasks/main.yml
vars/main.yml
defaults/main.yml
Refactor NVIDIA GPU OpenMPI build flags and naming
  • Rename hpc_install_openmpi to hpc_build_nvidia_openmpi across tasks and defaults
  • Update when conditions and task names for building HPC-X/OpenMPI
  • Adjust README to document new build vs system install flags
tasks/main.yml
defaults/main.yml
README.md
Add firewall management support
  • Introduce hpc_manage_firewall variable and default
  • Add task to set default firewall zone to trusted
  • Update example playbook and README for firewall flag
tasks/main.yml
defaults/main.yml
examples/default_full_install.yml
README.md
Standardize file permissions and reorder handlers
  • Normalize mode settings from 0755 to 0644 (and vice versa) on module and template installs
  • Move Reboot handler after other handlers to maintain ordering
tasks/main.yml
handlers/main.yml
templates/90-hpc-sysctl.conf
templates/90-sunrpc.conf
Enhance automated testing for system tunings
  • Add tests to verify ulimits from 90-hpc-limits.conf
  • Add tests to validate sysctl values and new sunrpc.conf template
tests/tests_skip_toolkit.yml
templates/90-sunrpc.conf
templates/90-hpc-sysctl.conf

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The firewall task is including the storage role instead of a firewall role and even has a typo in its name; please correct it to include the proper firewall role and fix the 'firewll' typo.
  • Variable names for building OpenMPI are inconsistent—defaults and README use hpc_build_openmpi while tasks use hpc_build_nvidia_openmpi—please unify them across the role.
  • You've changed several Lua module files and templates to 0755 (executable) where 0644 would likely suffice; please review and tighten permissions to avoid unnecessary executable bits.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The firewall task is including the storage role instead of a firewall role and even has a typo in its name; please correct it to include the proper firewall role and fix the 'firewll' typo.
- Variable names for building OpenMPI are inconsistent—defaults and README use hpc_build_openmpi while tasks use hpc_build_nvidia_openmpi—please unify them across the role.
- You've changed several Lua module files and templates to 0755 (executable) where 0644 would likely suffice; please review and tighten permissions to avoid unnecessary executable bits.

## Individual Comments

### Comment 1
<location> `tasks/main.yml:45` </location>
<code_context>
     - "{{ __hpc_nvidia_cuda_repo }}"
     - "{{ __hpc_microsoft_prod_repo }}"

+- name: Configure firewll to use trusted zone as default
+  when: hpc_manage_firewall
+  include_role:
</code_context>

<issue_to_address>
**nitpick (typo):** Typo in task name: 'firewll' should be 'firewall'.

Please update the task name to 'firewall' for consistency.

```suggestion
- name: Configure firewall to use trusted zone as default
```
</issue_to_address>

### Comment 2
<location> `tasks/main.yml:342` </location>
<code_context>
         owner: root
         group: root
-        mode: '0644'
+        mode: '0755'

     - name: Install GDRCopy packages
</code_context>

<issue_to_address>
**question:** Changed file mode from '0644' to '0755' for module files; review necessity of executable permissions.

Executable permissions are unnecessary for these module files; '0644' is standard unless a specific need for '0755' exists.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

owner: root
group: root
mode: '0644'
mode: '0755'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Changed file mode from '0644' to '0755' for module files; review necessity of executable permissions.

Executable permissions are unnecessary for these module files; '0644' is standard unless a specific need for '0755' exists.

@spetrosi spetrosi force-pushed the install-system-openmpi branch from ec5720b to cf7a403 Compare September 30, 2025 09:42
@spetrosi spetrosi force-pushed the install-system-openmpi branch 4 times, most recently from 981a02f to bcff17f Compare September 30, 2025 12:13
@spetrosi spetrosi force-pushed the install-system-openmpi branch from bcff17f to a528c4e Compare September 30, 2025 13:13
@spetrosi spetrosi force-pushed the install-system-openmpi branch from ce479c7 to 2fda79e Compare September 30, 2025 16:37
@spetrosi spetrosi force-pushed the install-system-openmpi branch from 2fda79e to e010790 Compare September 30, 2025 16:58
or __hpc_versionlock_content.stdout is not search(item + "-[0-9]")
loop: "{{ __hpc_versionlock_rpms }}"

- name: Update all packages to bring system to the latest state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to happen at the very beginning of the process, before you do any version lock or any installation of anything related to HPC.

@spetrosi spetrosi force-pushed the install-system-openmpi branch 2 times, most recently from 1619576 to 147ef03 Compare October 1, 2025 09:19
@spetrosi spetrosi force-pushed the install-system-openmpi branch from 147ef03 to 3bd5fd4 Compare October 1, 2025 10:30
@spetrosi
Copy link
Collaborator Author

spetrosi commented Oct 1, 2025

[citest]

@spetrosi spetrosi force-pushed the install-system-openmpi branch from 9082c57 to 3e5f2a0 Compare October 1, 2025 13:34
@spetrosi
Copy link
Collaborator Author

spetrosi commented Oct 1, 2025

[citest]

@spetrosi spetrosi merged commit 4a9a047 into linux-system-roles:main Oct 2, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants