Skip to content

Conversation

@mschwoer
Copy link
Contributor

@mschwoer mschwoer commented Dec 17, 2024

Remove logic on module import:

  • initialize temporary mmap dir only on demand
  • do not initialize temporary mmap dir on reset

This caused slowing down unit tests of AlphaDIA as well as many log messages.

solves #279

@mschwoer mschwoer requested review from GeorgWa and swillems December 17, 2024 10:43
return _temp_dir, temp_dir_name


_TEMP_DIR, TEMP_DIR_NAME = make_temp_dir()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the culprit

"""Initialize the temporary directory for the temp mmap arrays if not already done."""
global _TEMP_DIR, TEMP_DIR_NAME

if _TEMP_DIR is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the fix

import alphatims.utils


_TEMP_DIR, TEMP_DIR_NAME = None, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization function lacks a proper return value when _TEMP_DIR already exists. Always return TEMP_DIR_NAME for consistency regardless of whether we're initializing a new directory or using an existing one.

_TEMP_DIR, TEMP_DIR_NAME = None, None
ARRAYS = {}
CLOSED = False
ALLOW_NDARRAY_SUBCLASS = False

def _init_temp_dir(prefix: str = "temp_mmap_"):
    "Initialize the temporary directory for the temp mmap arrays if not already done."
    global _TEMP_DIR, TEMP_DIR_NAME

    if _TEMP_DIR is None:
        _TEMP_DIR, TEMP_DIR_NAME = make_temp_dir(prefix)
        return TEMP_DIR_NAME
    return TEMP_DIR_NAME

temp_dir_name = _temp_dir.name

logging.info(
f"Memory-mapped arrays are written to temporary directory {temp_dir_name}. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging statement is using TEMP_DIR_NAME which hasn't been assigned yet at this point. It should use temp_dir_name instead, which is the local variable containing the path to the newly created temporary directory.

    logging.info(
        f"Memory-mapped arrays are written to temporary directory {temp_dir_name}. "
        "Cleanup of this folder is OS dependent and might need to be triggered manually! "
        f"Current space: {shutil.disk_usage(temp_dir_name)[-1]:,}"
    )

np.ndarray
A writable temporary mmapped array.
"""
temp_dir_name = _init_temp_dir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function should store the return value of _init_temp_dir() to ensure it's using the correct directory path.

    temp_dir_name = _init_temp_dir()

_memmap.close()
del _memmap
del _TEMP_DIR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is incorrectly formatted. The " should be placed around the entire docstring including the "WARNING" section to properly document the function.

def reset() -> str:
    "Reset the temporary folder containing temp mmapped arrays and create a new temporary folder."

    WARNING: All existing temp mmapp arrays will become unusable!

    Returns
    -------
    str
        The name of the new temporary folder.
    "
    global _TEMP_DIR
    global TEMP_DIR_NAME
    global ARRAYS

    _reset()

    _TEMP_DIR, TEMP_DIR_NAME = make_temp_dir()
    ARRAYS = {}

    return TEMP_DIR_NAME

- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: conda-incubator/setup-miniconda@v3
with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a duplicate miniconda-version line in the GitHub workflow. The redundant line should be removed to avoid confusion and potential conflicts.

      - uses: conda-incubator/setup-miniconda@v3
        with:
          auto-update-conda: true
          miniconda-version: "latest"
          python-version: ${{ matrix.python-version }}

README.md Outdated
## How to contribute

If you like AlphaTims you can give us a [star](stargazers) to boost our visibility! All direct contributions are also welcome. Feel free to post a new [issue](https://github.com/MannLabs/alphabase/issues) or clone the repository and create a [pull request](https://github.com/MannLabs/alphabase/pulls) with a new branch. For an even more interactive participation, check out the [discussions](https://github.com/MannLabs/alphabase/discussions).
If you like AlphaTims you can give us a [star](stargazers) to boost our visibility! All direct contributions are also welcome. Feel free to post a new [issue](https://github.com/MannLabs/alphatims/issues) or clone the repository and create a [pull request](https://github.com/MannLabs/alphabase/pulls) with a new branch. For an even more interactive participation, check out the [discussions](https://github.com/MannLabs/alphabase/discussions).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL for pull requests is incorrect. It currently points to alphabase/pulls instead of alphatims/pulls. This should be corrected to maintain consistency with the issues link which was properly updated.

If you like AlphaTims you can give us a [star](stargazers) to boost our visibility! All direct contributions are also welcome. Feel free to post a new [issue](https://github.com/MannLabs/alphatims/issues) or clone the repository and create a [pull request](https://github.com/MannLabs/alphatims/pulls) with a new branch. For an even more interactive participation, check out the [discussions](https://github.com/MannLabs/alphatims/discussions).

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2025

Number of tokens: input_tokens=15717 output_tokens=1269 MAX_NUM_OUTPUT_TOKENS=4096

Copy link
Collaborator

@sander-willems-bruker sander-willems-bruker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been over it quickly. Have not tested it or scrutinized the code, but looks good and if it still works as expected with a quick local test I am happy to give a (mostly rubber stamp) approval

README.md Outdated
## How to contribute

If you like AlphaTims you can give us a [star](stargazers) to boost our visibility! All direct contributions are also welcome. Feel free to post a new [issue](https://github.com/MannLabs/alphabase/issues) or clone the repository and create a [pull request](https://github.com/MannLabs/alphabase/pulls) with a new branch. For an even more interactive participation, check out the [discussions](https://github.com/MannLabs/alphabase/discussions).
If you like AlphaTims you can give us a [star](stargazers) to boost our visibility! All direct contributions are also welcome. Feel free to post a new [issue](https://github.com/MannLabs/alphatims/issues) or clone the repository and create a [pull request](https://github.com/MannLabs/alphatims/pulls) with a new branch. For an even more interactive participation, check out the [discussions](https://github.com/MannLabs/alphabase/discussions).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussions also still point to alphabase

logging.info(f"cpu frequency - {psutil.cpu_freq().current:.2f} Mhz")
except AttributeError:
logging.info("Unable to log cpu frequency")
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use generic errors than specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid any interference of this non-critical logic (logging cpu frequency) with the remaning code execution.

More generally:
While narrow exception clauses have their merits in many cases, I feel that broad exception handling should be frowned upon less (it's even flagged by ruff!)

One reason being the above one: I want to continue regardless of the error.

Another one is that narrow exception handling couples the code quite tightly to an API: if someone chooses to raise some CustomError instead of a ValueError in a new version of their package, the narrow exception handling breaks. And I guess exceptions are an often overlooked part of an API, so spurious changes are more likely there.

@mschwoer
Copy link
Contributor Author

Been over it quickly. Have not tested it or scrutinized the code, but looks good and if it still works as expected with a quick local test I am happy to give a (mostly rubber stamp) approval

yup, gave it a quick test and still worked.. also, all automated tests pass

@mschwoer mschwoer merged commit 7a0edb2 into develop Mar 13, 2025
6 checks passed
@mschwoer mschwoer deleted the improve_mmap branch March 13, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants