Skip to content

Conversation

@wspear
Copy link
Collaborator

@wspear wspear commented Jan 27, 2026

This started as a run at a clean-start demo runner script plus file mapping support, but now includes 5 commits addressing issues I ran into while testing the demo. I need to test these changes on additional systems before they can be merged.

  1. Reduce MPI environment confusion. Wi4MPI Fixes (2635a96)
    Core infrastructure improvements addressing environment confusion

Problem: Artifacts from conflicting MPI stacks (e.g., Intel oneAPI libraries when using MPICH) are inadvertently captured during profile detection, requiring manual pruning or root-bind workarounds.

Changes:

MPI Artifact Filtering (detect.py)

Implemented _filter_mpi_artifacts() to discard shared objects not matching the MPI launcher's installation prefix
Heuristic: retains libraries sharing the launcher's prefix or system paths (usr, /)
Fail-open approach: conservatively retains all libraries if no match found (supports divergent installs)
New --mpi-filter argument allows activation/deactivation or manual library exclusion
Added comprehensive filtering tests (test_filter_mpi.py) with 223 lines of test coverage
Wi4MPI Robustness (__execute.py, init.py)

Added _alias_wi4mpi_fake_libs_for_guest() to create SONAME aliases (e.g., libmpich.so → libmpi.so) for Wi4MPI's fake libraries
Improved robustness of wi4mpi_root detection
Timeout Support (util.py)

Added timeout parameter to run_e4scl_subprocess() to prevent hangs during detect operations
Template Improvements (template.py)

Support for injecting extra environment variables into container entrypoint (used for Wi4MPI configuration)
Container Execution (launch.py, trace.py)

Explicitly filter root (/) directory from file bindings to prevent permission errors in Apptainer/Singularity
Updated opened_files() to correctly handle multi-process traces (e.g., MPI), preventing premature exit or hangs
Environment Variables (init.py)

Improved E4S_CL_SCRIPT resolution
Added E4S_CL_CONTAINER_DIR environment variable override for internal container directory
Impact: Eliminates manual profile pruning workarounds; vastly reduces environment confusion in multi-MPI setups.

  1. Support file aliasing with host:container syntax in profiles (0188982)
    Adds file path mapping capabilities

Problem: Users cannot map host files to different container paths.

Changes:

File Aliasing Parser (__execute.py)

Extended file binding logic to parse host:container syntax
Splits paths on : to detect source/destination pairs
Updated Container.bind_file() calls to use dest argument for aliased paths
Improved Validation

Added access checks for both source and destination paths
Falls back gracefully with error logging if paths not found
Documentation Updates

Updated CLI help for profile edit and execute commands to document host:container syntax
Added regression test test_execute_alias to ensure bindings are correctly applied
Example Usage: --files /host/lib/libfoo.so:/container/lib/libbar.so

  1. Added step by step demo runner script (ed0afec)
    Comprehensive validation and demonstration tool

Changes:

New Script (demo.sh) — 717 lines
High-level e4s-cl validation script starting from clean local build
Uses Apptainer + MPI-enabled container image
Primary focus: run container-built MPI app using host MPI
Captures host baseline, e4s-cl launch, and optional in-container baseline
Compares performance (host vs e4s-cl vs container-only)
Includes automatic environment detection and setup
Usage:

./scripts/demo.sh [--container-cmd apptainer|podman|...] [--container-image ] [--mpi-impl mpich|openmpi] [--np ]

  1. Fix Wi4MPI target library binding in container (551b07f)
    Critical fix for Wi4MPI library preloading

Problem: Environment variables WI4MPI_RUN_MPI_C_LIB and WI4MPI_RUN_MPI_F_LIB were updated to point to container paths without ensuring the underlying host libraries were actually bound, causing loader errors ("file too short").

Changes:

Library Import (__execute.py)
Explicitly call import_library() for WI4MPI_RUN_MPI_C_LIB before updating the environment variable
Explicitly call import_library() for WI4MPI_RUN_MPI_F_LIB before updating
Ensures host libraries are bound to container before Wi4MPI attempts preloading
Impact: Resolves Wi4MPI preload failures; enables reliable MPI API translation.

  1. Detect and warn about OpenMPI CMA fallback (222348b)
    Potential performance issue advisory

This set of changes addresses "environment confusion" issues where artifacts from conflicting MPI stacks (e.g., Intel oneAPI artifacts when using MPICH) are inadvertently captured. It also improves robustness for Wi4MPI translation and container execution.

The following changes should obviate the need for manual profile pruning or root-bind workarounds

1. MPI Artifact Filtering & Profile Detection:
   - e4s_cl/cli/commands/profile/detect.py: Implemented `_filter_mpi_artifacts` to discard shared objects that do not match the MPI launcher's installation prefix.
     - Heuristic: Keeps libraries sharing the launcher's prefix or system paths (/usr, /).
     - Fail-Open: Conservatively retains all libraries if no match is found (supports divergent installs).
     - New --mpi-filter argument allows activation/deactivation of filter or manual specification of libraries to exclude
   - e4s_cl/util.py: Added timeout support to `run_e4scl_subprocess` to prevent hangs during `detect` (e.g., if mpirun hangs).

2. Wi4MPI Support Improvements:
   - e4s_cl/cli/commands/__execute.py: Added `_alias_wi4mpi_fake_libs_for_guest` to create SONAME aliases (e.g., libmpich.so → libmpi.so) for Wi4MPI's fake libraries, ensuring guest binaries link correctly to the translation layer.
   - e4s_cl/cf/wi4mpi/__init__.py: Improved robustness of `wi4mpi_root` detection.
   - e4s_cl/cf/template.py: Added support for injecting extra environment variables into the container entrypoint (used for Wi4MPI configuration).

3. Container Execution & Stability:
   - e4s_cl/cli/commands/launch.py & __execute.py: Explicitly filter root (`/`) directory from file bindings. This prevents permission errors in Apptainer/Singularity runtimes.
   - e4s_cl/cf/trace.py: Updated `opened_files` to correctly handle process exits in multi-process traces (e.g., MPI), ensuring the tracer doesn't exit prematurely or hang.
   - e4s_cl/__init__.py: improved `E4S_CL_SCRIPT` resolution and added `E4S_CL_CONTAINER_DIR` environment variable override for the internal container directory.
Added support for mapping host files to different locations inside the container using host:container syntax in the --files argument (and via profile edit --add-files).

Parsed file paths in execute command to detect colons and split into source/destination.
Leveraged Container.bind_file with valid dest argument for aliased paths.
Updated CLI help documentation for profile edit and execute.
Added regression test test_execute_alias ensuring bindings are correctly applied.
Explicitly import `WI4MPI_RUN_MPI_C_LIB` and `WI4MPI_RUN_MPI_F_LIB` into the container. Previously, these environment variables were updated to point to container paths without ensuring the underlying host libraries were actually bound, causing loader errors (e.g., "file too short") when Wi4MPI attempted to preload them.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces significant improvements to e4s-cl's MPI environment handling, particularly addressing multi-MPI stack confusion and improving Wi4MPI robustness. The changes span 5 main commits covering MPI artifact filtering, file path aliasing, Wi4MPI fixes, a comprehensive demo script, and OpenMPI CMA fallback detection.

Changes:

  • Added MPI artifact filtering to discard libraries from conflicting MPI stacks during profile detection, with comprehensive test coverage
  • Implemented file path aliasing (host:container syntax) to map host files to different container paths
  • Added a 717-line demo/validation script for comprehensive e4s-cl testing
  • Fixed Wi4MPI target library binding and improved preload order
  • Added timeout support to prevent detection operations from hanging

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_filter_mpi.py New comprehensive test suite for MPI artifact filtering with 223 lines covering various scenarios
tests/test_execute.py Added test for file aliasing functionality
scripts/demo.sh New 717-line demonstration and validation script for e4s-cl
e4s_cl/util.py Added timeout parameter to run_e4scl_subprocess to prevent hangs
e4s_cl/cli/commands/profile/edit.py Updated help text to document file aliasing syntax
e4s_cl/cli/commands/profile/detect.py Implemented comprehensive MPI artifact filtering with --mpi-filter options
e4s_cl/cli/commands/launch.py Fixed root directory binding issues and improved Wi4MPI target library handling
e4s_cl/cli/commands/__execute.py Implemented file aliasing parser and Wi4MPI environment variable overrides
e4s_cl/cf/wi4mpi/init.py Improved Wi4MPI preload order and debugging output
e4s_cl/cf/trace.py Fixed multi-process trace handling to prevent premature exit
e4s_cl/cf/template.py Added support for extra environment variables in container entrypoint
e4s_cl/init.py Improved E4S_CL_SCRIPT resolution and added E4S_CL_CONTAINER_DIR override
Comments suppressed due to low confidence (1)

e4s_cl/util.py:238

  • The return type annotation for run_e4scl_subprocess is incorrect. The function signature declares -> int but the function always returns a tuple (returncode, output) on line 237 when capture_output=True, or (returncode, '') on line 238 when capture_output=False. The return type should be -> Tuple[int, str] or a union type to accurately reflect the actual return value.
def run_e4scl_subprocess(cmd, cwd=None, env=None, capture_output=False, timeout=None) -> int:
    """
    cmd: list[str],
    env: Optional[dict],
    capture_output: bool
    timeout: Optional[float], timeout in seconds
    Run a subprocess, tailored for recursive e4s-cl processes
    """
    with ParentStatus():
        subproc_env = os.environ
        if env:
            for key, val in env.items():
                if val is None:
                    subproc_env.pop(key, None)
                    _heavy_debug("unset %s", key)
                else:
                    subproc_env[key] = val
                    _heavy_debug("%s=%s", key, val)

        LOGGER.debug("Running with parent status: %s", cmd)
        with subprocess.Popen(
                cmd,
                cwd=cwd,
                env=subproc_env,
                stdout=subprocess.PIPE if capture_output else sys.stdout,
                stderr=sys.stderr,
                close_fds=False,
                universal_newlines=True,
                bufsize=1) as proc:

            try:
                output, _ = proc.communicate(timeout=timeout)
                returncode = proc.returncode
            except subprocess.TimeoutExpired:
                LOGGER.warning("Subprocess timed out after %s seconds: %s", timeout, cmd)
                proc.kill()
                proc.wait()
                returncode = -1
                output = ""

    if capture_output:
        return returncode, output
    return returncode, ''

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wspear and others added 7 commits January 27, 2026 15:10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 466 to 469
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source, dest = parts
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colon-based splitting for file aliasing is fragile and will fail on Windows paths (e.g., "C:\path") or paths that legitimately contain colons. Consider using a more robust delimiter or escaping mechanism. The common convention is to use "=" or to require exactly one unescaped colon. Additionally, the current implementation doesn't validate that the destination path is absolute, which could lead to unexpected behavior based on how bind_file handles relative destination paths.

Copilot uses AI. Check for mistakes.
Comment on lines +537 to +539
if host_mpi_lib:
params.extra_env["WI4MPI_RUN_MPIIO_C_LIB"] = container_mpi_lib.as_posix()
params.extra_env["WI4MPI_RUN_MPIIO_F_LIB"] = container_mpi_f_lib.as_posix()
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 537, the code sets WI4MPI_RUN_MPIIO_C_LIB to container_mpi_lib only if host_mpi_lib is set, but host_mpi_lib was processed earlier (lines 525-530) and is not in scope for the condition at line 537. This condition should check if host_mpi_lib was non-empty earlier, but the variable is in scope. However, the logic is confusing: line 538 sets WI4MPI_RUN_MPIIO_C_LIB = container_mpi_lib which was defined on line 528, but line 539 sets WI4MPI_RUN_MPIIO_F_LIB = container_mpi_f_lib. The comment suggests MPIIO libraries should follow the same pattern as the regular libraries, so line 538 should likely use the F library for MPIIO_F_LIB, not the C library.

Copilot uses AI. Check for mistakes.
Comment on lines +122 to +126
# Build extra environment exports
extra_env_lines = []
for name, value in self.extra_env.items():
extra_env_lines.append(f'export {name}="{value}"')
extra_env_str = '\n'.join(extra_env_lines)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra_env feature injects arbitrary environment variable exports into the container entrypoint script. However, the current implementation doesn't properly escape the values when building the export statements. If a value contains special characters like quotes, backticks, or dollar signs, it could break the script or allow command injection. The values should be properly shell-escaped before being inserted into the template.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +129
with patch('e4s_cl.cf.containers.Container.bind_file') as bind_mock:
with tempfile.NamedTemporaryFile() as temp_file:
self.assertCommandReturnValue(0, COMMAND, [
'--backend',
'containerless',
'--image',
'',
'--libraries',
libmpi,
'--files',
f"{temp_file.name}:/tmp/target",
'ls',
])

bind_mock.assert_any_call(temp_file.name, dest='/tmp/target', option=FileOptions.READ_WRITE)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses patch to mock Container.bind_file, but it doesn't verify that the source file actually gets bound. The test only checks that bind_file was called with the correct destination parameter. However, if the aliasing logic in __execute.py has a bug (as identified in another comment), this test might pass even though the feature doesn't work correctly. Consider adding assertions to verify both the source and destination are correctly passed, or use a more comprehensive test that exercises the actual file binding.

Copilot uses AI. Check for mistakes.
Comment on lines 464 to 476
for path in files:
source, dest = path, None
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source, dest = parts

if dest and _check_access(source):
container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)
elif _check_access(path):
container.bind_file(path, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", source)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file aliasing feature doesn't validate the destination path for path traversal attacks. A malicious user could provide a destination like "/tmp/../../etc/passwd" which, after path resolution, could bind host files to sensitive container locations. Consider validating that the destination path, after normalization, is within expected container directories (e.g., starts with CONTAINER_DIR or other allowed prefixes).

Copilot uses AI. Check for mistakes.
Comment on lines +344 to +351
# Mapping from generic MPI library names to MPICH-specific names
# Wi4MPI provides libmpi.so.*, libmpifort.so.*, etc.
# MPICH uses libmpich.so.*, libmpichfort.so.*, etc.
name_mappings = {
'libmpi.so': ['libmpich.so', 'libmpi.so'],
'libmpifort.so': ['libmpichfort.so', 'libmpifort.so'],
'libmpicxx.so': ['libmpichcxx.so', 'libmpicxx.so'],
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function _alias_wi4mpi_fake_libs_for_guest uses a hardcoded name_mappings dictionary that maps generic MPI library names to MPICH-specific names. However, the comment on line 324 states it needs to handle "MPICH-specific names (e.g., libmpich.so.12, libmpichfort.so.12) so that container binaries linked to MPICH can find them." But the mapping includes both 'libmpich.so' and 'libmpi.so' as targets for 'libmpi.so'. This seems redundant - if the fake library is already named libmpi.so.X, creating an alias to libmpi.so.X is unnecessary. Consider removing the redundant self-mapping or documenting why it's needed.

Copilot uses AI. Check for mistakes.
wspear and others added 5 commits January 27, 2026 15:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…lags more carefully and warns about invalid python.
…mand

- Wi4MPI: Added installation verification (ELF arch check, binary check), support for ppc64le, and better handling of missing or invalid install directories.
- Launch: Added automatic binding of driver/config directories for graphics/interconnect libraries (ibverbs, OpenCL, Vulkan, etc.) to prevent driver loader errors.
- CLI: Added new `config` command to list, get, and set configuration values.
- Demo: Added arguments for launcher/OSU control, enabled container baseline, and forced local Wi4MPI usage.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 14 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 465 to 483
source, dest = path, None
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source, dest = parts
else:
LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
continue

if dest is not None:
if _check_access(source):
container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", source)
else:
if _check_access(path):
container.bind_file(path, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", path)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file binding code treats paths as strings instead of Path objects, which could lead to inconsistent path handling. The variable 'path' is split as a string, but then 'source' is passed to '_check_access' which likely expects a Path object based on the pattern used elsewhere in the codebase. This could cause issues with path resolution and validation.

Additionally, when 'dest' is None, the code passes 'path' (the original string) to 'bind_file', but when 'dest' is set, it passes 'source' (also a string). For consistency and proper path handling, these should be converted to Path objects before being passed to Container.bind_file.

Suggested change
source, dest = path, None
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source, dest = parts
else:
LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
continue
if dest is not None:
if _check_access(source):
container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", source)
else:
if _check_access(path):
container.bind_file(path, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", path)
source_str, dest = path, None
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source_str, dest = parts
else:
LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
continue
source = Path(source_str)
if dest is not None:
if _check_access(source):
container.bind_file(source, dest=dest, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", str(source))
else:
if _check_access(source):
container.bind_file(source, option=FileOptions.READ_WRITE)
else:
LOGGER.error("File '%s' not found.", str(source))

Copilot uses AI. Check for mistakes.
E4S_CL_VERBOSE="0"

log() { printf "[e4s-cl-test] %s\n" "$*"; }
fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fail function uses printf with a format string that contains user-supplied content from "$*", which could interpret format specifiers unintentionally. If the error message contains format specifiers like '%s', '%d', etc., they will be interpreted as format directives rather than literal text. This could lead to unexpected behavior, crashes, or in rare cases, format string vulnerabilities.

For example, if fail is called with a message like "File %s not found", printf will look for additional arguments to substitute for %s, which don't exist, leading to undefined behavior.

Use printf '%s\n' "$*" instead to treat the entire message as a literal string.

Suggested change
fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }
fail() { printf '%s\n' "[e4s-cl-test] ERROR: $*" >&2; exit 1; }

Copilot uses AI. Check for mistakes.
Comment on lines 289 to 293
for marker in ['lib', 'lib64']:
if marker in path.parts:
return Path(*path.parts[:path.parts.index(marker)])
return path.parent

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _get_lib_prefix function doesn't handle paths correctly when the library is in a non-standard location without 'lib' or 'lib64' in the path. For example, a library at '/opt/custom/special/libmpi.so' would return '/opt/custom/special' (the parent directory) rather than failing or using a different heuristic. This could lead to incorrect prefix matching when comparing launcher and library prefixes.

Additionally, the function looks for markers in path.parts but doesn't verify the marker is actually a directory name (it could be part of a file name or parent directory name). Consider adding validation or using a more robust approach to determine the installation prefix.

Suggested change
for marker in ['lib', 'lib64']:
if marker in path.parts:
return Path(*path.parts[:path.parts.index(marker)])
return path.parent
"""
Infer an installation prefix from a library path.
Heuristic:
- If the path refers to a file (e.g., /opt/pkg/lib/libmpi.so), start
from its parent directory.
- Walk up the directory tree looking for a directory named 'lib' or
'lib64'. If found, return its parent directory as the prefix.
- If no such directory is found, fall back to the directory containing
the library (or the path itself if it is already a directory).
"""
# Start from the directory that contains the library, if applicable
search_path = path
if search_path.suffix:
# Heuristic: a non-empty suffix usually indicates a file such as
# 'libmpi.so', so we use its parent directory for prefix discovery.
search_path = search_path.parent
# Walk up the directory hierarchy, looking for a 'lib' or 'lib64' dir
for parent in (search_path,) + tuple(search_path.parents):
if parent.name in ('lib', 'lib64'):
# The installation prefix is the parent of the lib directory
return parent.parent
# Fallback: no standard lib directory in the ancestry; use the
# directory we started from (or the original path if it was a dir).
return search_path

Copilot uses AI. Check for mistakes.
Comment on lines +308 to +328
# 2. Add the target MPI F library (provides Fortran symbols)
target_mpi_f_lib = os.environ.get("WI4MPI_RUN_MPI_F_LIB", "")
if target_mpi_f_lib and Path(target_mpi_f_lib).exists():
if container_library_dir:
container_path = Path(container_library_dir) / Path(target_mpi_f_lib).name
to_preload.append(container_path.as_posix())
LOGGER.debug("Wi4MPI: Adding target MPI F library to preload (container path): %s", container_path)
else:
to_preload.append(target_mpi_f_lib)
LOGGER.debug("Wi4MPI: Adding target MPI F library to preload: %s", target_mpi_f_lib)

# 3. Add the target MPI C library (provides C symbols like ompi_mpi_char)
target_mpi_c_lib = os.environ.get("WI4MPI_RUN_MPI_C_LIB", "")
if target_mpi_c_lib and Path(target_mpi_c_lib).exists():
if container_library_dir:
container_path = Path(container_library_dir) / Path(target_mpi_c_lib).name
to_preload.append(container_path.as_posix())
LOGGER.debug("Wi4MPI: Adding target MPI C library to preload (container path): %s", container_path)
else:
to_preload.append(target_mpi_c_lib)
LOGGER.debug("Wi4MPI: Adding target MPI C library to preload: %s", target_mpi_c_lib)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wi4mpi_preload function uses container_library_dir to construct container paths for the target MPI libraries, but it doesn't verify that these libraries were actually imported into the container at those locations. The function assumes that if a host library exists and container_library_dir is provided, the library will be available at the constructed path. However, this import happens elsewhere (in __execute.py lines 534 and 541), and there's no guarantee that the import succeeded or that the paths match the expected pattern.

This could lead to LD_PRELOAD failures at runtime if the library wasn't successfully imported. Consider adding validation or ensuring the imports happen before calling wi4mpi_preload, or at minimum adding a comment warning about this dependency.

Copilot uses AI. Check for mistakes.
Comment on lines 52 to 57
# When a process exits (especially in MPI with multiple processes),
# the debugger may still have other processes. Check if we should continue.
if len(debugger.dict) > 0:
continue
else:
break
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace.py fix checks if len(debugger.dict) > 0 to determine whether to continue tracing, but this may not correctly handle all edge cases with multi-process MPI programs. The 'debugger.dict' contains all attached processes, but checking its length alone doesn't account for processes that have been detached or are in intermediate states.

Additionally, when a ProcessExit event occurs, the process that exited is still in debugger.dict at that moment - it's only removed later. This means the check if len(debugger.dict) > 0 will almost always be true when a process exits (because the exiting process is still counted), causing the loop to continue indefinitely until all processes exit, which may not be the intended behavior.

Consider checking if the exiting process is the last one, or using a more robust method to track active processes.

Suggested change
# When a process exits (especially in MPI with multiple processes),
# the debugger may still have other processes. Check if we should continue.
if len(debugger.dict) > 0:
continue
else:
break
# Explicitly remove the exited process from the debugger, then
# decide whether to continue tracing based on remaining processes.
try:
debugger.deleteProcess(event.process)
except KeyError:
# Process might already have been removed; ignore in that case.
pass
if not debugger:
break
continue

Copilot uses AI. Check for mistakes.
Comment on lines 466 to 471
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source, dest = parts
else:
LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file path parsing logic doesn't handle colons in file paths correctly. On Unix systems, file paths can legitimately contain colons (e.g., "/path/to/file:with:colons.txt"). The simple split on ':' will incorrectly parse such paths as having multiple parts when they should be treated as a single source path. This will cause the file binding to fail with an "Invalid file binding specification" error for legitimate file paths containing colons.

Consider using a more robust parsing approach, such as:

  • Only splitting on the first colon (rsplit with maxsplit=1 from the right)
  • Requiring an explicit syntax like "source::dest" to distinguish from paths with colons
  • Or validating that both parts are valid paths before treating it as an alias
Suggested change
if ':' in path:
parts = path.split(':')
if len(parts) == 2:
source, dest = parts
else:
LOGGER.error("Invalid file binding specification '%s'. Expected 'source:dest'.", path)
# On Unix, colons can be part of a valid path. On Windows, a single colon
# is used in drive letters (e.g., C:\...). Detect Windows drive-letter
# absolute paths and avoid treating them as source:dest bindings.
is_windows_drive_path = (
os.name == 'nt'
and re.match(r'^[A-Za-z]:[\\/]', path) is not None
)
if not is_windows_drive_path and ':' in path:
# Split on the last colon only so earlier colons remain part of the source path.
source_part, dest_part = path.rsplit(':', 1)
if source_part and dest_part:
source, dest = source_part, dest_part
else:
LOGGER.error(
"Invalid file binding specification '%s'. Expected 'source:dest'.",
path,
)

Copilot uses AI. Check for mistakes.
fail() { printf "[e4s-cl-test] ERROR: %s\n" "$*" >&2; exit 1; }

print_cmd() {
printf "${TERM_BOLD_CYAN}$ %s${TERM_RESET}\n" "$*"
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print_cmd function has the same format string vulnerability as log and fail. Using printf with "$*" as a format string can lead to unintended interpretation of format specifiers in the command being printed.

Use printf '%s\n' "${TERM_BOLD_CYAN}$ $*${TERM_RESET}" instead.

Suggested change
printf "${TERM_BOLD_CYAN}$ %s${TERM_RESET}\n" "$*"
printf '%s\n' "${TERM_BOLD_CYAN}$ $*${TERM_RESET}"

Copilot uses AI. Check for mistakes.
e4s_cl/util.py Outdated
Comment on lines 233 to 235
proc.wait()
returncode = -1
output = ""
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout handling doesn't call communicate() to drain remaining output before killing the process. After calling proc.kill(), the code immediately calls proc.wait() without attempting to read any remaining output. This can cause the subprocess to hang if there's buffered output that needs to be consumed before the process can fully terminate.

According to subprocess documentation, after killing a process, it's good practice to call communicate() (or at least drain the pipes) before wait() to avoid potential deadlocks. Consider adding proc.communicate() after proc.kill() to properly clean up.

Suggested change
proc.wait()
returncode = -1
output = ""
try:
# Drain any remaining output to avoid deadlocks.
output, _ = proc.communicate()
except Exception:
# If draining fails for any reason, fall back to empty output.
output = ""
returncode = -1

Copilot uses AI. Check for mistakes.
scripts/demo.sh Outdated
Comment on lines 864 to 866
run_timed "container_${bench_name}" ${CONTAINER_CMD} exec -B "${E4S_CL_WORKDIR}:/work" "${E4S_CL_IMAGE}" bash -lc "\
mpirun -np ${E4S_CL_MPI_PROCS} /work/osu-container/libexec/osu-micro-benchmarks/mpi/${bench} ${OSU_ARGS[*]} \
" | tee "${out_file}"
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User-controlled values E4S_CL_MPI_PROCS and OSU_ARGS are interpolated directly into the bash -lc command string for the container baseline run. Because these values are not validated or safely quoted, an attacker who can influence --mpi-procs or --osu-args can inject additional shell commands that execute inside the container (with host directories bound), resulting in arbitrary code execution in the caller’s context. Construct this command without bash -lc (or strictly validate and escape these parameters) so that MPI options and benchmark arguments are passed as separate, non-shell-interpreted arguments.

Suggested change
run_timed "container_${bench_name}" ${CONTAINER_CMD} exec -B "${E4S_CL_WORKDIR}:/work" "${E4S_CL_IMAGE}" bash -lc "\
mpirun -np ${E4S_CL_MPI_PROCS} /work/osu-container/libexec/osu-micro-benchmarks/mpi/${bench} ${OSU_ARGS[*]} \
" | tee "${out_file}"
run_timed "container_${bench_name}" ${CONTAINER_CMD} exec -B "${E4S_CL_WORKDIR}:/work" "${E4S_CL_IMAGE}" \
mpirun -np "${E4S_CL_MPI_PROCS}" "/work/osu-container/libexec/osu-micro-benchmarks/mpi/${bench}" \
"${OSU_ARGS[@]}" | tee "${out_file}"

Copilot uses AI. Check for mistakes.
wspear and others added 2 commits January 29, 2026 10:28
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wspear wspear marked this pull request as ready for review January 29, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant