Demo script, file mapping, wi4mpi fixes and mpi-confusion fixes #127

wspear · 2026-01-27T22:56:11Z

This started as a run at a clean-start demo runner script plus file mapping support, but now includes 5 commits addressing issues I ran into while testing the demo. I need to test these changes on additional systems before they can be merged.

Reduce MPI environment confusion. Wi4MPI Fixes (2635a96)
Core infrastructure improvements addressing environment confusion

Problem: Artifacts from conflicting MPI stacks (e.g., Intel oneAPI libraries when using MPICH) are inadvertently captured during profile detection, requiring manual pruning or root-bind workarounds.

Changes:

MPI Artifact Filtering (detect.py)

Implemented _filter_mpi_artifacts() to discard shared objects not matching the MPI launcher's installation prefix
Heuristic: retains libraries sharing the launcher's prefix or system paths (usr, /)
Fail-open approach: conservatively retains all libraries if no match found (supports divergent installs)
New --mpi-filter argument allows activation/deactivation or manual library exclusion
Added comprehensive filtering tests (test_filter_mpi.py) with 223 lines of test coverage
Wi4MPI Robustness (__execute.py, init.py)

Added _alias_wi4mpi_fake_libs_for_guest() to create SONAME aliases (e.g., libmpich.so → libmpi.so) for Wi4MPI's fake libraries
Improved robustness of wi4mpi_root detection
Timeout Support (util.py)

Added timeout parameter to run_e4scl_subprocess() to prevent hangs during detect operations
Template Improvements (template.py)

Support for injecting extra environment variables into container entrypoint (used for Wi4MPI configuration)
Container Execution (launch.py, trace.py)

Explicitly filter root (/) directory from file bindings to prevent permission errors in Apptainer/Singularity
Updated opened_files() to correctly handle multi-process traces (e.g., MPI), preventing premature exit or hangs
Environment Variables (init.py)

Improved E4S_CL_SCRIPT resolution
Added E4S_CL_CONTAINER_DIR environment variable override for internal container directory
Impact: Eliminates manual profile pruning workarounds; vastly reduces environment confusion in multi-MPI setups.

Support file aliasing with host:container syntax in profiles (0188982)
Adds file path mapping capabilities

Problem: Users cannot map host files to different container paths.

Changes:

File Aliasing Parser (__execute.py)

Extended file binding logic to parse host:container syntax
Splits paths on : to detect source/destination pairs
Updated Container.bind_file() calls to use dest argument for aliased paths
Improved Validation

Added access checks for both source and destination paths
Falls back gracefully with error logging if paths not found
Documentation Updates

Updated CLI help for profile edit and execute commands to document host:container syntax
Added regression test test_execute_alias to ensure bindings are correctly applied
Example Usage: --files /host/lib/libfoo.so:/container/lib/libbar.so

Added step by step demo runner script (ed0afec)
Comprehensive validation and demonstration tool

Changes:

New Script (demo.sh) — 717 lines
High-level e4s-cl validation script starting from clean local build
Uses Apptainer + MPI-enabled container image
Primary focus: run container-built MPI app using host MPI
Captures host baseline, e4s-cl launch, and optional in-container baseline
Compares performance (host vs e4s-cl vs container-only)
Includes automatic environment detection and setup
Usage:

./scripts/demo.sh [--container-cmd apptainer|podman|...] [--container-image ] [--mpi-impl mpich|openmpi] [--np ]

Fix Wi4MPI target library binding in container (551b07f)
Critical fix for Wi4MPI library preloading

Problem: Environment variables WI4MPI_RUN_MPI_C_LIB and WI4MPI_RUN_MPI_F_LIB were updated to point to container paths without ensuring the underlying host libraries were actually bound, causing loader errors ("file too short").

Changes:

Library Import (__execute.py)
Explicitly call import_library() for WI4MPI_RUN_MPI_C_LIB before updating the environment variable
Explicitly call import_library() for WI4MPI_RUN_MPI_F_LIB before updating
Ensures host libraries are bound to container before Wi4MPI attempts preloading
Impact: Resolves Wi4MPI preload failures; enables reliable MPI API translation.

Detect and warn about OpenMPI CMA fallback (222348b)
Potential performance issue advisory

This set of changes addresses "environment confusion" issues where artifacts from conflicting MPI stacks (e.g., Intel oneAPI artifacts when using MPICH) are inadvertently captured. It also improves robustness for Wi4MPI translation and container execution. The following changes should obviate the need for manual profile pruning or root-bind workarounds 1. MPI Artifact Filtering & Profile Detection: - e4s_cl/cli/commands/profile/detect.py: Implemented `_filter_mpi_artifacts` to discard shared objects that do not match the MPI launcher's installation prefix. - Heuristic: Keeps libraries sharing the launcher's prefix or system paths (/usr, /). - Fail-Open: Conservatively retains all libraries if no match is found (supports divergent installs). - New --mpi-filter argument allows activation/deactivation of filter or manual specification of libraries to exclude - e4s_cl/util.py: Added timeout support to `run_e4scl_subprocess` to prevent hangs during `detect` (e.g., if mpirun hangs). 2. Wi4MPI Support Improvements: - e4s_cl/cli/commands/__execute.py: Added `_alias_wi4mpi_fake_libs_for_guest` to create SONAME aliases (e.g., libmpich.so → libmpi.so) for Wi4MPI's fake libraries, ensuring guest binaries link correctly to the translation layer. - e4s_cl/cf/wi4mpi/__init__.py: Improved robustness of `wi4mpi_root` detection. - e4s_cl/cf/template.py: Added support for injecting extra environment variables into the container entrypoint (used for Wi4MPI configuration). 3. Container Execution & Stability: - e4s_cl/cli/commands/launch.py & __execute.py: Explicitly filter root (`/`) directory from file bindings. This prevents permission errors in Apptainer/Singularity runtimes. - e4s_cl/cf/trace.py: Updated `opened_files` to correctly handle process exits in multi-process traces (e.g., MPI), ensuring the tracer doesn't exit prematurely or hang. - e4s_cl/__init__.py: improved `E4S_CL_SCRIPT` resolution and added `E4S_CL_CONTAINER_DIR` environment variable override for the internal container directory.

Added support for mapping host files to different locations inside the container using host:container syntax in the --files argument (and via profile edit --add-files). Parsed file paths in execute command to detect colons and split into source/destination. Leveraged Container.bind_file with valid dest argument for aliased paths. Updated CLI help documentation for profile edit and execute. Added regression test test_execute_alias ensuring bindings are correctly applied.

Explicitly import `WI4MPI_RUN_MPI_C_LIB` and `WI4MPI_RUN_MPI_F_LIB` into the container. Previously, these environment variables were updated to point to container paths without ensuring the underlying host libraries were actually bound, causing loader errors (e.g., "file too short") when Wi4MPI attempted to preload them.

Copilot

Pull request overview

This pull request introduces significant improvements to e4s-cl's MPI environment handling, particularly addressing multi-MPI stack confusion and improving Wi4MPI robustness. The changes span 5 main commits covering MPI artifact filtering, file path aliasing, Wi4MPI fixes, a comprehensive demo script, and OpenMPI CMA fallback detection.

Changes:

Added MPI artifact filtering to discard libraries from conflicting MPI stacks during profile detection, with comprehensive test coverage
Implemented file path aliasing (host:container syntax) to map host files to different container paths
Added a 717-line demo/validation script for comprehensive e4s-cl testing
Fixed Wi4MPI target library binding and improved preload order
Added timeout support to prevent detection operations from hanging

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_filter_mpi.py	New comprehensive test suite for MPI artifact filtering with 223 lines covering various scenarios
tests/test_execute.py	Added test for file aliasing functionality
scripts/demo.sh	New 717-line demonstration and validation script for e4s-cl
e4s_cl/util.py	Added timeout parameter to run_e4scl_subprocess to prevent hangs
e4s_cl/cli/commands/profile/edit.py	Updated help text to document file aliasing syntax
e4s_cl/cli/commands/profile/detect.py	Implemented comprehensive MPI artifact filtering with --mpi-filter options
e4s_cl/cli/commands/launch.py	Fixed root directory binding issues and improved Wi4MPI target library handling
e4s_cl/cli/commands/__execute.py	Implemented file aliasing parser and Wi4MPI environment variable overrides
e4s_cl/cf/wi4mpi/init.py	Improved Wi4MPI preload order and debugging output
e4s_cl/cf/trace.py	Fixed multi-process trace handling to prevent premature exit
e4s_cl/cf/template.py	Added support for extra environment variables in container entrypoint
e4s_cl/init.py	Improved E4S_CL_SCRIPT resolution and added E4S_CL_CONTAINER_DIR override

Comments suppressed due to low confidence (1)

e4s_cl/util.py:238

The return type annotation for run_e4scl_subprocess is incorrect. The function signature declares -> int but the function always returns a tuple (returncode, output) on line 237 when capture_output=True, or (returncode, '') on line 238 when capture_output=False. The return type should be -> Tuple[int, str] or a union type to accurately reflect the actual return value.

def run_e4scl_subprocess(cmd, cwd=None, env=None, capture_output=False, timeout=None) -> int:
    """
    cmd: list[str],
    env: Optional[dict],
    capture_output: bool
    timeout: Optional[float], timeout in seconds
    Run a subprocess, tailored for recursive e4s-cl processes
    """
    with ParentStatus():
        subproc_env = os.environ
        if env:
            for key, val in env.items():
                if val is None:
                    subproc_env.pop(key, None)
                    _heavy_debug("unset %s", key)
                else:
                    subproc_env[key] = val
                    _heavy_debug("%s=%s", key, val)

        LOGGER.debug("Running with parent status: %s", cmd)
        with subprocess.Popen(
                cmd,
                cwd=cwd,
                env=subproc_env,
                stdout=subprocess.PIPE if capture_output else sys.stdout,
                stderr=sys.stderr,
                close_fds=False,
                universal_newlines=True,
                bufsize=1) as proc:

            try:
                output, _ = proc.communicate(timeout=timeout)
                returncode = proc.returncode
            except subprocess.TimeoutExpired:
                LOGGER.warning("Subprocess timed out after %s seconds: %s", timeout, cmd)
                proc.kill()
                proc.wait()
                returncode = -1
                output = ""

    if capture_output:
        return returncode, output
    return returncode, ''

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/demo.sh

e4s_cl/cli/commands/__execute.py

scripts/demo.sh

e4s_cl/__init__.py

scripts/demo.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T23:56:18Z

e4s_cl/cli/commands/__execute.py

+            if ':' in path:
+                parts = path.split(':')
+                if len(parts) == 2:
+                    source, dest = parts


The colon-based splitting for file aliasing is fragile and will fail on Windows paths (e.g., "C:\path") or paths that legitimately contain colons. Consider using a more robust delimiter or escaping mechanism. The common convention is to use "=" or to require exactly one unescaped colon. Additionally, the current implementation doesn't validate that the destination path is absolute, which could lead to unexpected behavior based on how bind_file handles relative destination paths.

Copilot · 2026-01-27T23:56:19Z

e4s_cl/cli/commands/__execute.py

+                if host_mpi_lib:
+                    params.extra_env["WI4MPI_RUN_MPIIO_C_LIB"] = container_mpi_lib.as_posix()
+                params.extra_env["WI4MPI_RUN_MPIIO_F_LIB"] = container_mpi_f_lib.as_posix()


At line 537, the code sets WI4MPI_RUN_MPIIO_C_LIB to container_mpi_lib only if host_mpi_lib is set, but host_mpi_lib was processed earlier (lines 525-530) and is not in scope for the condition at line 537. This condition should check if host_mpi_lib was non-empty earlier, but the variable is in scope. However, the logic is confusing: line 538 sets WI4MPI_RUN_MPIIO_C_LIB = container_mpi_lib which was defined on line 528, but line 539 sets WI4MPI_RUN_MPIIO_F_LIB = container_mpi_f_lib. The comment suggests MPIIO libraries should follow the same pattern as the regular libraries, so line 538 should likely use the F library for MPIIO_F_LIB, not the C library.

Copilot · 2026-01-27T23:56:19Z

e4s_cl/cf/template.py

+        # Build extra environment exports
+        extra_env_lines = []
+        for name, value in self.extra_env.items():
+            extra_env_lines.append(f'export {name}="{value}"')
+        extra_env_str = '\n'.join(extra_env_lines)


The extra_env feature injects arbitrary environment variable exports into the container entrypoint script. However, the current implementation doesn't properly escape the values when building the export statements. If a value contains special characters like quotes, backticks, or dollar signs, it could break the script or allow command injection. The values should be properly shell-escaped before being inserted into the template.

Copilot · 2026-01-27T23:56:19Z

tests/test_execute.py

+        with patch('e4s_cl.cf.containers.Container.bind_file') as bind_mock:
+            with tempfile.NamedTemporaryFile() as temp_file:
+                self.assertCommandReturnValue(0, COMMAND, [
+                    '--backend',
+                    'containerless',
+                    '--image',
+                    '',
+                    '--libraries',
+                    libmpi,
+                    '--files',
+                    f"{temp_file.name}:/tmp/target",
+                    'ls',
+                ])
+
+                bind_mock.assert_any_call(temp_file.name, dest='/tmp/target', option=FileOptions.READ_WRITE)


The test uses patch to mock Container.bind_file, but it doesn't verify that the source file actually gets bound. The test only checks that bind_file was called with the correct destination parameter. However, if the aliasing logic in __execute.py has a bug (as identified in another comment), this test might pass even though the feature doesn't work correctly. Consider adding assertions to verify both the source and destination are correctly passed, or use a more comprehensive test that exercises the actual file binding.

e4s_cl/cf/wi4mpi/__init__.py

Copilot · 2026-01-27T23:56:20Z

e4s_cl/cli/commands/__execute.py

+        for path in files:
+            source, dest = path, None
+            if ':' in path:
+                parts = path.split(':')
+                if len(parts) == 2:
+                    source, dest = parts
+
+            if dest and _check_access(source):
+                container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)
+            elif _check_access(path):
+                container.bind_file(path, option=FileOptions.READ_WRITE)
+            else:
+                LOGGER.error("File '%s' not found.", source)


The file aliasing feature doesn't validate the destination path for path traversal attacks. A malicious user could provide a destination like "/tmp/../../etc/passwd" which, after path resolution, could bind host files to sensitive container locations. Consider validating that the destination path, after normalization, is within expected container directories (e.g., starts with CONTAINER_DIR or other allowed prefixes).

Copilot · 2026-01-27T23:56:20Z

e4s_cl/cli/commands/__execute.py

+    # Mapping from generic MPI library names to MPICH-specific names
+    # Wi4MPI provides libmpi.so.*, libmpifort.so.*, etc.
+    # MPICH uses libmpich.so.*, libmpichfort.so.*, etc.
+    name_mappings = {
+        'libmpi.so': ['libmpich.so', 'libmpi.so'],
+        'libmpifort.so': ['libmpichfort.so', 'libmpifort.so'],
+        'libmpicxx.so': ['libmpichcxx.so', 'libmpicxx.so'],
+    }


The function _alias_wi4mpi_fake_libs_for_guest uses a hardcoded name_mappings dictionary that maps generic MPI library names to MPICH-specific names. However, the comment on line 324 states it needs to handle "MPICH-specific names (e.g., libmpich.so.12, libmpichfort.so.12) so that container binaries linked to MPICH can find them." But the mapping includes both 'libmpich.so' and 'libmpi.so' as targets for 'libmpi.so'. This seems redundant - if the fake library is already named libmpi.so.X, creating an alias to libmpi.so.X is unnecessary. Consider removing the redundant self-mapping or documenting why it's needed.

e4s_cl/cli/commands/__execute.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…lags more carefully and warns about invalid python.

…mand - Wi4MPI: Added installation verification (ELF arch check, binary check), support for ppc64le, and better handling of missing or invalid install directories. - Launch: Added automatic binding of driver/config directories for graphics/interconnect libraries (ibverbs, OpenCL, Vulkan, etc.) to prevent driver loader errors. - CLI: Added new `config` command to list, get, and set configuration values. - Demo: Added arguments for launcher/OSU control, enabled container baseline, and forced local Wi4MPI usage.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 14 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T03:11:47Z

e4s_cl/cli/commands/__execute.py

+            source, dest = path, None
+            if ':' in path:
+                parts = path.split(':')
+                if len(parts) == 2:
+                    source, dest = parts
+                else:
+                    LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
+                    continue
+
+            if dest is not None:
+                if _check_access(source):
+                    container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)
+                else:
+                    LOGGER.error("File '%s' not found.", source)
+            else:
+                if _check_access(path):
+                    container.bind_file(path, option=FileOptions.READ_WRITE)
+                else:
+                    LOGGER.error("File '%s' not found.", path)


The file binding code treats paths as strings instead of Path objects, which could lead to inconsistent path handling. The variable 'path' is split as a string, but then 'source' is passed to '_check_access' which likely expects a Path object based on the pattern used elsewhere in the codebase. This could cause issues with path resolution and validation.

Additionally, when 'dest' is None, the code passes 'path' (the original string) to 'bind_file', but when 'dest' is set, it passes 'source' (also a string). For consistency and proper path handling, these should be converted to Path objects before being passed to Container.bind_file.

Suggested change

source, dest = path, None

if ':' in path:

parts = path.split(':')

if len(parts) == 2:

source, dest = parts

else:

LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)

continue

if dest is not None:

if _check_access(source):

container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)

else:

LOGGER.error("File '%s' not found.", source)

else:

if _check_access(path):

container.bind_file(path, option=FileOptions.READ_WRITE)

else:

LOGGER.error("File '%s' not found.", path)

source_str, dest = path, None

if ':' in path:

parts = path.split(':')

if len(parts) == 2:

source_str, dest = parts

else:

LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)

continue

source = Path(source_str)

if dest is not None:

if _check_access(source):

container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)

else:

LOGGER.error("File '%s' not found.", str(source))

else:

if _check_access(source):

container.bind_file(source, option=FileOptions.READ_WRITE)

else:

LOGGER.error("File '%s' not found.", str(source))

Copilot · 2026-01-29T03:11:47Z

scripts/demo.sh

+E4S_CL_VERBOSE="0"
+
+log() { printf "[e4s-cl-test] %s\n" "$*"; }
+fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }


The fail function uses printf with a format string that contains user-supplied content from "$*", which could interpret format specifiers unintentionally. If the error message contains format specifiers like '%s', '%d', etc., they will be interpreted as format directives rather than literal text. This could lead to unexpected behavior, crashes, or in rare cases, format string vulnerabilities.

For example, if fail is called with a message like "File %s not found", printf will look for additional arguments to substitute for %s, which don't exist, leading to undefined behavior.

Use printf '%s\n' "$*" instead to treat the entire message as a literal string.

Suggested change

fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }

fail() { printf '%s\n' "[e4s-cl-test] ERROR: $*" >&2; exit 1; }

Copilot · 2026-01-29T03:11:48Z

e4s_cl/cli/commands/profile/detect.py

+        for marker in ['lib', 'lib64']:
+            if marker in path.parts:
+                return Path(*path.parts[:path.parts.index(marker)])
+        return path.parent
+


The _get_lib_prefix function doesn't handle paths correctly when the library is in a non-standard location without 'lib' or 'lib64' in the path. For example, a library at '/opt/custom/special/libmpi.so' would return '/opt/custom/special' (the parent directory) rather than failing or using a different heuristic. This could lead to incorrect prefix matching when comparing launcher and library prefixes.

Additionally, the function looks for markers in path.parts but doesn't verify the marker is actually a directory name (it could be part of a file name or parent directory name). Consider adding validation or using a more robust approach to determine the installation prefix.

Suggested change

for marker in ['lib', 'lib64']:

if marker in path.parts:

return Path(*path.parts[:path.parts.index(marker)])

return path.parent

"""

Infer an installation prefix from a library path.

Heuristic:

- If the path refers to a file (e.g., /opt/pkg/lib/libmpi.so), start

from its parent directory.

- Walk up the directory tree looking for a directory named 'lib' or

'lib64'. If found, return its parent directory as the prefix.

- If no such directory is found, fall back to the directory containing

the library (or the path itself if it is already a directory).

"""

# Start from the directory that contains the library, if applicable

search_path = path

if search_path.suffix:

# Heuristic: a non-empty suffix usually indicates a file such as

# 'libmpi.so', so we use its parent directory for prefix discovery.

search_path = search_path.parent

# Walk up the directory hierarchy, looking for a 'lib' or 'lib64' dir

for parent in (search_path,) + tuple(search_path.parents):

if parent.name in ('lib', 'lib64'):

# The installation prefix is the parent of the lib directory

return parent.parent

# Fallback: no standard lib directory in the ancestry; use the

# directory we started from (or the original path if it was a dir).

return search_path

Copilot · 2026-01-29T03:11:48Z

e4s_cl/cf/wi4mpi/__init__.py

+    # 2. Add the target MPI F library (provides Fortran symbols)
+    target_mpi_f_lib = os.environ.get("WI4MPI_RUN_MPI_F_LIB", "")
+    if target_mpi_f_lib and Path(target_mpi_f_lib).exists():
+        if container_library_dir:
+            container_path = Path(container_library_dir) / Path(target_mpi_f_lib).name
+            to_preload.append(container_path.as_posix())
+            LOGGER.debug("Wi4MPI: Adding target MPI F library to preload (container path): %s", container_path)
+        else:
+            to_preload.append(target_mpi_f_lib)
+            LOGGER.debug("Wi4MPI: Adding target MPI F library to preload: %s", target_mpi_f_lib)
+
+    # 3. Add the target MPI C library (provides C symbols like ompi_mpi_char)
+    target_mpi_c_lib = os.environ.get("WI4MPI_RUN_MPI_C_LIB", "")
+    if target_mpi_c_lib and Path(target_mpi_c_lib).exists():
+        if container_library_dir:
+            container_path = Path(container_library_dir) / Path(target_mpi_c_lib).name
+            to_preload.append(container_path.as_posix())
+            LOGGER.debug("Wi4MPI: Adding target MPI C library to preload (container path): %s", container_path)
+        else:
+            to_preload.append(target_mpi_c_lib)
+            LOGGER.debug("Wi4MPI: Adding target MPI C library to preload: %s", target_mpi_c_lib)


The wi4mpi_preload function uses container_library_dir to construct container paths for the target MPI libraries, but it doesn't verify that these libraries were actually imported into the container at those locations. The function assumes that if a host library exists and container_library_dir is provided, the library will be available at the constructed path. However, this import happens elsewhere (in __execute.py lines 534 and 541), and there's no guarantee that the import succeeded or that the paths match the expected pattern.

This could lead to LD_PRELOAD failures at runtime if the library wasn't successfully imported. Consider adding validation or ensuring the imports happen before calling wi4mpi_preload, or at minimum adding a comment warning about this dependency.

scripts/demo.sh

Copilot · 2026-01-29T03:11:49Z

e4s_cl/cf/trace.py

+                # When a process exits (especially in MPI with multiple processes),
+                # the debugger may still have other processes. Check if we should continue.
+                if len(debugger.dict) > 0:
+                    continue
+                else:
+                    break


The trace.py fix checks if len(debugger.dict) > 0 to determine whether to continue tracing, but this may not correctly handle all edge cases with multi-process MPI programs. The 'debugger.dict' contains all attached processes, but checking its length alone doesn't account for processes that have been detached or are in intermediate states.

Additionally, when a ProcessExit event occurs, the process that exited is still in debugger.dict at that moment - it's only removed later. This means the check if len(debugger.dict) > 0 will almost always be true when a process exits (because the exiting process is still counted), causing the loop to continue indefinitely until all processes exit, which may not be the intended behavior.

Consider checking if the exiting process is the last one, or using a more robust method to track active processes.

Suggested change

# When a process exits (especially in MPI with multiple processes),

# the debugger may still have other processes. Check if we should continue.

if len(debugger.dict) > 0:

continue

else:

break

# Explicitly remove the exited process from the debugger, then

# decide whether to continue tracing based on remaining processes.

try:

debugger.deleteProcess(event.process)

except KeyError:

# Process might already have been removed; ignore in that case.

pass

if not debugger:

break

continue

Copilot · 2026-01-29T03:11:50Z

e4s_cl/cli/commands/__execute.py

+            if ':' in path:
+                parts = path.split(':')
+                if len(parts) == 2:
+                    source, dest = parts
+                else:
+                    LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)


The file path parsing logic doesn't handle colons in file paths correctly. On Unix systems, file paths can legitimately contain colons (e.g., "/path/to/file:with:colons.txt"). The simple split on ':' will incorrectly parse such paths as having multiple parts when they should be treated as a single source path. This will cause the file binding to fail with an "Invalid file binding specification" error for legitimate file paths containing colons.

Consider using a more robust parsing approach, such as:

Only splitting on the first colon (rsplit with maxsplit=1 from the right)

Requiring an explicit syntax like "source::dest" to distinguish from paths with colons

Or validating that both parts are valid paths before treating it as an alias

Suggested change

if ':' in path:

parts = path.split(':')

if len(parts) == 2:

source, dest = parts

else:

LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)

# On Unix, colons can be part of a valid path. On Windows, a single colon

# is used in drive letters (e.g., C:\...). Detect Windows drive-letter

# absolute paths and avoid treating them as source:dest bindings.

is_windows_drive_path = (

os.name == 'nt'

and re.match(r'^[A-Za-z]:[\\/]', path) is not None

)

if not is_windows_drive_path and ':' in path:

# Split on the last colon only so earlier colons remain part of the source path.

source_part, dest_part = path.rsplit(':', 1)

if source_part and dest_part:

source, dest = source_part, dest_part

else:

LOGGER.error(

"Invalid file binding specification '%s'. Expected 'source:dest'.",

path,

)

Copilot · 2026-01-29T03:11:50Z

scripts/demo.sh

+fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }
+
+print_cmd() {
+  printf "${TERM_BOLD_CYAN}$ %s${TERM_RESET}\n" "$*"


The print_cmd function has the same format string vulnerability as log and fail. Using printf with "$*" as a format string can lead to unintended interpretation of format specifiers in the command being printed.

Use printf '%s\n' "${TERM_BOLD_CYAN}$ $*${TERM_RESET}" instead.

Suggested change

printf "${TERM_BOLD_CYAN}$ %s${TERM_RESET}\n" "$*"

printf '%s\n' "${TERM_BOLD_CYAN}$ $*${TERM_RESET}"

Copilot · 2026-01-29T03:11:50Z

e4s_cl/util.py

+                proc.wait()
+                returncode = -1
+                output = ""


The timeout handling doesn't call communicate() to drain remaining output before killing the process. After calling proc.kill(), the code immediately calls proc.wait() without attempting to read any remaining output. This can cause the subprocess to hang if there's buffered output that needs to be consumed before the process can fully terminate.

According to subprocess documentation, after killing a process, it's good practice to call communicate() (or at least drain the pipes) before wait() to avoid potential deadlocks. Consider adding proc.communicate() after proc.kill() to properly clean up.

Suggested change

proc.wait()

returncode = -1

output = ""

try:

# Drain any remaining output to avoid deadlocks.

output, _ = proc.communicate()

except Exception:

# If draining fails for any reason, fall back to empty output.

output = ""

returncode = -1

Copilot · 2026-01-29T03:11:50Z

scripts/demo.sh

+    run_timed "container_${bench_name}" ${CONTAINER_CMD} exec -B "${E4S_CL_WORKDIR}:/work" "${E4S_CL_IMAGE}" bash -lc "\
+      mpirun -np ${E4S_CL_MPI_PROCS} /work/osu-container/libexec/osu-micro-benchmarks/mpi/${bench} ${OSU_ARGS[*]} \
+    " | tee "${out_file}"


User-controlled values E4S_CL_MPI_PROCS and OSU_ARGS are interpolated directly into the bash -lc command string for the container baseline run. Because these values are not validated or safely quoted, an attacker who can influence --mpi-procs or --osu-args can inject additional shell commands that execute inside the container (with host directories bound), resulting in arbitrary code execution in the caller’s context. Construct this command without bash -lc (or strictly validate and escape these parameters) so that MPI options and benchmark arguments are passed as separate, non-shell-interpreted arguments.

Suggested change

run_timed "container_${bench_name}" ${CONTAINER_CMD} exec -B "${E4S_CL_WORKDIR}:/work" "${E4S_CL_IMAGE}" bash -lc "\

mpirun -np ${E4S_CL_MPI_PROCS} /work/osu-container/libexec/osu-micro-benchmarks/mpi/${bench} ${OSU_ARGS[*]} \

" | tee "${out_file}"

run_timed "container_${bench_name}" ${CONTAINER_CMD} exec -B "${E4S_CL_WORKDIR}:/work" "${E4S_CL_IMAGE}" \

mpirun -np "${E4S_CL_MPI_PROCS}" "/work/osu-container/libexec/osu-micro-benchmarks/mpi/${bench}" \

"${OSU_ARGS[@]}" | tee "${out_file}"

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

wspear added 5 commits January 27, 2026 12:30

Added step by step demo runner script.

ed0afec

Detect and warn about OpenMPI CMA fallback

222348b

wspear requested a review from Copilot January 27, 2026 22:56

wspear self-assigned this Jan 27, 2026

Copilot started reviewing on behalf of wspear January 27, 2026 22:56 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

wspear and others added 7 commits January 27, 2026 15:10

Update e4s_cl/__init__.py

4d5e8b2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/demo.sh

11444d2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update e4s_cl/cli/commands/__execute.py

2e5e336

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/demo.sh

efab6d5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Corrected return type

de8aff4

Avoid potential NameError if WI4MPI_RUN_MPI_C_LIB is not set

c4b0baf

Add hash check for osu benchmarks download.

f384b69

wspear requested a review from Copilot January 27, 2026 23:46

Copilot started reviewing on behalf of wspear January 27, 2026 23:47 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

wspear and others added 5 commits January 27, 2026 15:59

Update e4s_cl/cf/wi4mpi/__init__.py

4e65247

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update e4s_cl/cli/commands/__execute.py

49c9404

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Check for wi4mpi specific compiler flags. Demo script sets compiler f…

fb5e9ba

…lags more carefully and warns about invalid python.

Demo script cleanup

5717e8d

wspear requested a review from Copilot January 29, 2026 03:00

Copilot started reviewing on behalf of wspear January 29, 2026 03:00 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

wspear and others added 2 commits January 29, 2026 10:28

Update scripts/demo.sh

2d7817f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Addressed several minor issues identified by copilot review

1b6ae89

wspear marked this pull request as ready for review January 29, 2026 22:34

	fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }
	fail() { printf '%s\n' "[e4s-cl-test] ERROR: $*" >&2; exit 1; }

-        for marker in ['lib', 'lib64']:
-            if marker in path.parts:
-                return Path(*path.parts[:path.parts.index(marker)])
-        return path.parent
+        """
+        Infer an installation prefix from a library path.
+        Heuristic:
+        - If the path refers to a file (e.g., /opt/pkg/lib/libmpi.so), start
+          from its parent directory.
+        - Walk up the directory tree looking for a directory named 'lib' or
+          'lib64'. If found, return its parent directory as the prefix.
+        - If no such directory is found, fall back to the directory containing
+          the library (or the path itself if it is already a directory).
+        """
+        # Start from the directory that contains the library, if applicable
+        search_path = path
+        if search_path.suffix:
+            # Heuristic: a non-empty suffix usually indicates a file such as
+            # 'libmpi.so', so we use its parent directory for prefix discovery.
+            search_path = search_path.parent
+        # Walk up the directory hierarchy, looking for a 'lib' or 'lib64' dir
+        for parent in (search_path,) + tuple(search_path.parents):
+            if parent.name in ('lib', 'lib64'):
+                # The installation prefix is the parent of the lib directory
+                return parent.parent
+        # Fallback: no standard lib directory in the ancestry; use the
+        # directory we started from (or the original path if it was a dir).
+        return search_path

-                # When a process exits (especially in MPI with multiple processes),
-                # the debugger may still have other processes. Check if we should continue.
-                if len(debugger.dict) > 0:
-                    continue
-                else:
-                    break
+                # Explicitly remove the exited process from the debugger, then
+                # decide whether to continue tracing based on remaining processes.
+                try:
+                    debugger.deleteProcess(event.process)
+                except KeyError:
+                    # Process might already have been removed; ignore in that case.
+                    pass
+                if not debugger:
+                    break
+                continue

-            if ':' in path:
-                parts = path.split(':')
-                if len(parts) == 2:
-                    source, dest = parts
-                else:
-                    LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
+            # On Unix, colons can be part of a valid path. On Windows, a single colon
+            # is used in drive letters (e.g., C:\...). Detect Windows drive-letter
+            # absolute paths and avoid treating them as source:dest bindings.
+            is_windows_drive_path = (
+                os.name == 'nt'
+                and re.match(r'^[A-Za-z]:[\\/]', path) is not None
+            )
+            if not is_windows_drive_path and ':' in path:
+                # Split on the last colon only so earlier colons remain part of the source path.
+                source_part, dest_part = path.rsplit(':', 1)
+                if source_part and dest_part:
+                    source, dest = source_part, dest_part
+                else:
+                    LOGGER.error(
+                        "Invalid file binding specification '%s'. Expected 'source:dest'.",
+                        path,
+                    )

	printf "${TERM_BOLD_CYAN}$ %s${TERM_RESET}\n" "$*"
	printf '%s\n' "${TERM_BOLD_CYAN}$ $*${TERM_RESET}"

Demo script, file mapping, wi4mpi fixes and mpi-confusion fixes #127

Are you sure you want to change the base?

Demo script, file mapping, wi4mpi fixes and mpi-confusion fixes #127

Uh oh!

Conversation

wspear commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development