Skip to content

Conversation

@lochhh
Copy link
Member

@lochhh lochhh commented Dec 11, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?
Closes #199 , part of #667

What does this PR do?

  • introduces the load module that unifies the interface to loading different kinds of datasets (bboxes, poses)
    • load.from_file replaces load_poses.from_file and load_bboxes.from_file
    • load.from_multiview_files replaces load_poses.from_multiview_files and now supports both bboxes and poses
    • @register_loader decorator, e.g. @register_loader("SLEAP", file_validators=[ValidSleapAnalysis, ValidSleapLabels], used to
      • "register" a from_<source_software>_file loader function in _LOADER_REGISTRY (essentially a source_software-loader mapping, used internally by load functions to dispatch to the respective from_<source_software>_file in either load_bboxes or load_poses based on the supplied source_software
      • link a loader to its specific file validator class(es)
      • wrap loader functions such that they accept str | Path as file input, validate the file via one of the file validator classes supplied, and passes this ValidFile as input to the underlying from_<source_software>_file
  • refactors the file validators
    • previously we use 2 validators (ValidFile - for validating file permissions, suffix; Valid<source_software><file format> - for validating file contents, e.g. expected headers, columns)
    • this has now been changed, such that we have one Validator per source_software, per file format
    • there are now also "composable" attrs.validators, e.g. _file_validator (a validator composed of multiple validators to check for permissions, access, suffix, etc.), _hdf_validator (a validator that checks for expected dataset in an h5 file), _if_instance_of (only run validator if it's an instance of cls): these can be "stacked"/composed as a single validator for the required file attribute in a ValidFile class, e.g. file = field(validator=validators.and_(A,B,C) or equivalently file = field(validator=[A,B,C])
    • as before, custom file validation logic can be additionally implemented and added using attrs syntax @file.validator
  • updates examples that are still using removed functions load_poses.from_file, load_bboxes.from_file, load_poses.from_multiview_files
  • adds guide on adding new loaders (and validators)

References

#199 #667

How has this PR been tested?

Tests have been added accordingly

Is this a breaking change?

Yes the functions load_poses.from_file, load_bboxes.from_file, load_poses.from_multiview_files have been removed. I figured since we're still in v0 breaking changes are acceptable.

Does this PR require an update to the documentation?

Added guide to adding new loaders in Contributing guide

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@sonarqubecloud
Copy link

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (5ad4b72) to head (5e566e9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #722   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           34        36    +2     
  Lines         2111      2170   +59     
=========================================
+ Hits          2111      2170   +59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@niksirbi
Copy link
Member

niksirbi commented Dec 19, 2025

I’ll just leave a quick thought here before I forget it over the holidays.

I love the unified loader! This is going to become the first function most people use in movement, and in many ways it’s the most important entry point into the package.

Based on the draft PR, the new interface would be:

from movement.io import load

ds = load.from_file("/path/to/file", source_software="DeepLabCut", fps=30)

I’m wondering whether this is a good moment to simplify the import path even further. For example:

from movement import load_tracks

ds = load_tracks("/path/to/file", source_software="DeepLabCut", fps=30)

The module structure could stay as is; we would just expose this convenience function in __init__.py.

Pros

  • It’s intuitive and consistent with patterns in other scientific Python libraries (e.g. pandas.read_csv).
  • “(Motion) tracks” is already an established term throughout our documentation.

Cons

The syntax would diverge from what we currently have. If we adopt this, the high-level function (movement.load_tracks) will have a different signature from the existing low-level functions (movement.io.load_poses.from_file).

I’m not sure how problematic that is: most new users would likely rely on the high-level function (and we can steer them toward it in the docs), while existing users can continue to use the lower-level functions if they prefer.

UPDATE
Just discovered this old issue, which basically suggests the same thing: #361

@sonarqubecloud
Copy link

@lochhh lochhh marked this pull request as ready for review January 27, 2026 11:13
@lochhh lochhh requested a review from a team January 27, 2026 11:13
Copy link
Member

@niksirbi niksirbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this heroic effort @lochhh! You've done it with your hallmark attention to detail. We definitely owe you a lot for undertaking this dry but extremely valuable work. The new section of the contributing guide is very clear now, and will be incredibly useful to us and to future contributors.

I have 3 main points (see inline comments for details and justifications), but I only consider the first one as 100% necessary.

  1. We should raise deprecation warnings for load_poses.from_file() and load_bboxes.from_file() and keep them around for at least 1 more release (with redirection to the new unified loader).
  2. In the Input/Output user guide, I'd suggest leading with the unified loader, and then expanding on software-specific loaders, adhering to the progressive disclosure principle.
  3. Now that I've seen the whole work with somewhat fresh eyes, I got the idea of naming the new unified loader as load_dataset() instead of from_file() (for reasons given in the corresponding inline comment). This would also imply renaming from_multiview_files() to load_multiview_dataset(). I'm not dead set on this (I see the merits of sticking to the from_* pattern), so happy to be overruled or persuaded otherwise.

The PR will also need to be rebased to main.

return decorator


def from_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth re-considering the from_file name. from_file is accurate and consistent with the software-specific loaders, which I appreciate. But it describes the input, not what you get back. from_file is generic enough that it could return anything—it doesn't immediately signal this is the main entry point for loading motion tracking data. This was somewhat less of a problem with the previous general loaders, because, e.g., load_poses.from_file at least signals that we are loading poses.

I personally think load_dataset would be clearer here than from_file. It immediately tells you what you're getting, it reads naturally (from movement.io import load_dataset), and it aligns with xarray's open_dataset and our own sample_data.fetch_dataset. In the long-term, that consistency may matter more than matching the from_* pattern internally.

To clarify, I'm only 65-35 % in favour of load_dataset over from_file, and I'm happy to be over-ruled by majority, or persuaded to change my mind if there are aspects I haven't thought of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised my proposal for load_dataset has repercussion for from_multiview_files().

The two public functions from movement.iofrom_file and from_multiview_files — form a consistent pair with the from_* pattern.
Renaming from_file to load_dataset would force us to also rename from_multiview_files to load_multiview_dataset (or load_dataset_multiview), adding to the deprecation burden. But I think that wouldn't be terrible (the new name makes sense, and I don't think from_multiview_files was much used.)

return valid_bboxes_inputs.to_dataset()


def from_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fro this function, as well as for load_poses.from_file(), I would raise deprecation warnings, and re-direct to the new unified loader under the hood. I suspect that load_poses.from_file() and load_bbloxes.from_file() are some of our most used interfaces, so if there ever was a time to warn about the deprecation, it's now.

return valid_poses_inputs.to_dataset()


def from_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this function for now, redirect it to the unified loader and raise a deprecation warning (see my comment on load_bboxes.from_file() for my reasons).

Comment on lines 69 to +70
FileExistsError
If the file exists when ``expected_permission`` is "w".
If the file exists when ``permission`` is "w".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a good time to revisit a question that has been bugging me. I'd originally implemented this to raise a FileExistsError if a file already exists when you are trying to save it. When I use movement as a user, this annoys me a lot, because I often re-run the same code snippet and hit this error every time after the first save. Most other libraries don't do this, i.e. they allow you to overwrite a file, so movement violates my implicit expectations.

What's your opinion on this? I'm also happy to raise this as a separate issue to discuss and solve independently of this PR.

# --- Helper functions --- #


def validate_file_path(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now a proper public function, what do you think of adding Parameters and Returns to its docstring? They're kind of obvious from the argument names, but that hasn't stopped us elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider renaming load_poses and save_poses to load_dataset and save_dataset

2 participants