Unify loaders #722

lochhh · 2025-12-11T19:17:28Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
Closes #199 , part of #667

What does this PR do?

introduces the load module that unifies the interface to loading different kinds of datasets (bboxes, poses)
- load.from_file replaces load_poses.from_file and load_bboxes.from_file
- load.from_multiview_files replaces load_poses.from_multiview_files and now supports both bboxes and poses
- @register_loader decorator, e.g. @register_loader("SLEAP", file_validators=[ValidSleapAnalysis, ValidSleapLabels], used to
  - "register" a from_<source_software>_file loader function in _LOADER_REGISTRY (essentially a source_software-loader mapping, used internally by load functions to dispatch to the respective from_<source_software>_file in either load_bboxes or load_poses based on the supplied source_software
  - link a loader to its specific file validator class(es)
  - wrap loader functions such that they accept str | Path as file input, validate the file via one of the file validator classes supplied, and passes this ValidFile as input to the underlying from_<source_software>_file
refactors the file validators
- previously we use 2 validators (ValidFile - for validating file permissions, suffix; Valid<source_software><file format> - for validating file contents, e.g. expected headers, columns)
- this has now been changed, such that we have one Validator per source_software, per file format
- there are now also "composable" attrs.validators, e.g. _file_validator (a validator composed of multiple validators to check for permissions, access, suffix, etc.), _hdf_validator (a validator that checks for expected dataset in an h5 file), _if_instance_of (only run validator if it's an instance of cls): these can be "stacked"/composed as a single validator for the required file attribute in a ValidFile class, e.g. file = field(validator=validators.and_(A,B,C) or equivalently file = field(validator=[A,B,C])
- as before, custom file validation logic can be additionally implemented and added using attrs syntax @file.validator
updates examples that are still using removed functions load_poses.from_file, load_bboxes.from_file, load_poses.from_multiview_files
adds guide on adding new loaders (and validators)

References

#199 #667

How has this PR been tested?

Tests have been added accordingly

Is this a breaking change?

Yes the functions load_poses.from_file, load_bboxes.from_file, load_poses.from_multiview_files have been removed. I figured since we're still in v0 breaking changes are acceptable.

Does this PR require an update to the documentation?

Added guide to adding new loaders in Contributing guide

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

sonarqubecloud · 2025-12-11T19:18:12Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov · 2025-12-11T19:22:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (5ad4b72) to head (5e566e9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #722   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           34        36    +2     
  Lines         2111      2170   +59     
=========================================
+ Hits          2111      2170   +59

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

niksirbi · 2025-12-19T14:10:50Z

I’ll just leave a quick thought here before I forget it over the holidays.

I love the unified loader! This is going to become the first function most people use in movement, and in many ways it’s the most important entry point into the package.

Based on the draft PR, the new interface would be:

from movement.io import load

ds = load.from_file("/path/to/file", source_software="DeepLabCut", fps=30)

I’m wondering whether this is a good moment to simplify the import path even further. For example:

from movement import load_tracks

ds = load_tracks("/path/to/file", source_software="DeepLabCut", fps=30)

The module structure could stay as is; we would just expose this convenience function in __init__.py.

Pros

It’s intuitive and consistent with patterns in other scientific Python libraries (e.g. pandas.read_csv).
“(Motion) tracks” is already an established term throughout our documentation.

Cons

The syntax would diverge from what we currently have. If we adopt this, the high-level function (movement.load_tracks) will have a different signature from the existing low-level functions (movement.io.load_poses.from_file).

I’m not sure how problematic that is: most new users would likely rely on the high-level function (and we can steer them toward it in the docs), while existing users can continue to use the lower-level functions if they prefer.

UPDATE
Just discovered this old issue, which basically suggests the same thing: #361

sonarqubecloud · 2026-01-26T19:27:04Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

niksirbi

Thanks a lot for this heroic effort @lochhh! You've done it with your hallmark attention to detail. We definitely owe you a lot for undertaking this dry but extremely valuable work. The new section of the contributing guide is very clear now, and will be incredibly useful to us and to future contributors.

I have 3 main points (see inline comments for details and justifications), but I only consider the first one as 100% necessary.

We should raise deprecation warnings for load_poses.from_file() and load_bboxes.from_file() and keep them around for at least 1 more release (with redirection to the new unified loader).
In the Input/Output user guide, I'd suggest leading with the unified loader, and then expanding on software-specific loaders, adhering to the progressive disclosure principle.
Now that I've seen the whole work with somewhat fresh eyes, I got the idea of naming the new unified loader as load_dataset() instead of from_file() (for reasons given in the corresponding inline comment). This would also imply renaming from_multiview_files() to load_multiview_dataset(). I'm not dead set on this (I see the merits of sticking to the from_* pattern), so happy to be overruled or persuaded otherwise.

The PR will also need to be rebased to main.

CONTRIBUTING.md

docs/source/user_guide/input_output.md

niksirbi · 2026-02-09T19:17:36Z

movement/io/load.py

+    return decorator
+
+
+def from_file(


I think it's worth re-considering the from_file name. from_file is accurate and consistent with the software-specific loaders, which I appreciate. But it describes the input, not what you get back. from_file is generic enough that it could return anything—it doesn't immediately signal this is the main entry point for loading motion tracking data. This was somewhat less of a problem with the previous general loaders, because, e.g., load_poses.from_file at least signals that we are loading poses.

I personally think load_dataset would be clearer here than from_file. It immediately tells you what you're getting, it reads naturally (from movement.io import load_dataset), and it aligns with xarray's open_dataset and our own sample_data.fetch_dataset. In the long-term, that consistency may matter more than matching the from_* pattern internally.

To clarify, I'm only 65-35 % in favour of load_dataset over from_file, and I'm happy to be over-ruled by majority, or persuaded to change my mind if there are aspects I haven't thought of.

I realised my proposal for load_dataset has repercussion for from_multiview_files().

The two public functions from movement.io— from_file and from_multiview_files — form a consistent pair with the from_* pattern.
Renaming from_file to load_dataset would force us to also rename from_multiview_files to load_multiview_dataset (or load_dataset_multiview), adding to the deprecation burden. But I think that wouldn't be terrible (the new name makes sense, and I don't think from_multiview_files was much used.)

niksirbi · 2026-02-09T19:27:42Z

movement/io/load_bboxes.py

    return valid_bboxes_inputs.to_dataset()


-def from_file(


Fro this function, as well as for load_poses.from_file(), I would raise deprecation warnings, and re-direct to the new unified loader under the hood. I suspect that load_poses.from_file() and load_bbloxes.from_file() are some of our most used interfaces, so if there ever was a time to warn about the deprecation, it's now.

niksirbi · 2026-02-09T20:24:06Z

movement/io/load_poses.py

    return valid_poses_inputs.to_dataset()


-def from_file(


I would keep this function for now, redirect it to the unified loader and raise a deprecation warning (see my comment on load_bboxes.from_file() for my reasons).

niksirbi · 2026-02-09T20:36:27Z

movement/validators/files.py

    FileExistsError
-        If the file exists when ``expected_permission`` is "w".
+        If the file exists when ``permission`` is "w".


This may be a good time to revisit a question that has been bugging me. I'd originally implemented this to raise a FileExistsError if a file already exists when you are trying to save it. When I use movement as a user, this annoys me a lot, because I often re-run the same code snippet and hit this error every time after the first save. Most other libraries don't do this, i.e. they allow you to overwrite a file, so movement violates my implicit expectations.

What's your opinion on this? I'm also happy to raise this as a separate issue to discuss and solve independently of this PR.

niksirbi · 2026-02-09T20:42:06Z

movement/validators/files.py

+# --- Helper functions --- #
+
+
+def validate_file_path(


Since this is now a proper public function, what do you think of adding Parameters and Returns to its docstring? They're kind of obvious from the argument names, but that hasn't stopped us elsewhere.

niksirbi mentioned this pull request Dec 19, 2025

feat: support dynamic kwargs in loader widget #720

Draft

7 tasks

This was referenced Dec 19, 2025

Added functions to support IO for Parquet files. #562

Closed

Implement I/O for parquet files #307

Open

lochhh mentioned this pull request Jan 6, 2026

Refactor dataset validators #706

Merged

7 tasks

niksirbi mentioned this pull request Jan 9, 2026

Feature/eks loader #670

Draft

4 tasks

lochhh force-pushed the unify-loaders branch from 912237d to 9e248e4 Compare January 13, 2026 19:50

This was referenced Jan 20, 2026

Add I/O support for Motion-BIDS #581

Open

Add support for COCO formats #182

Open

lochhh force-pushed the unify-loaders branch from 9e248e4 to 5face53 Compare January 20, 2026 18:52

niksirbi mentioned this pull request Jan 21, 2026

Automatically infer file format without requiring source_software #422

Open

lochhh force-pushed the unify-loaders branch from 046924b to afa3bfe Compare January 22, 2026 19:40

lochhh added 16 commits January 26, 2026 19:26

Draft load module

1d680ec

Make from_multiview_files general

d7fa5fc

Clarify behaviour of from_multiview_files

4934f2d

Cover one of poses and bboxes in test_from_multiview_files

5682d35

Add loader registration with file validation

9182c04

Rename file_path to file

75d7b86

Draft file validators + register_loader refactor

6d0404f

Remove unused dataset-specific from_file and from_multiview_files

2fd6889

Fix order of file checks

7c3c0c6

Update file validators tests and simplify file fixtures

da8b76b

Add missing path converters

e1ee65e

Import load modules to trigger decorator registration

e4932e2

Use load.from_file in napari widget

4457f7c

Add file validator tests

57aa363

Add from_file tests

b593d81

Use TypeAlias for source_software type

d66af83

lochhh added 14 commits January 26, 2026 19:26

Clean up and revert to str | Path

84945de

Allow registering loaders without validators

6aa3977

Refactor _file_is_accessible

d386f34

Refactor register_decorator

512ed0a

Expose unified load functions in movement.io

507b417

Update ref to from_multiview_files in Datasets guide

a4f2a4b

Use consistent valid_path naming

64285a7

Add from_file section in IO guide

0386098

Simplify suffixes check in file validator

886e9ec

Include instance in _if_instance_of definition

67a0dc8

Add guide for implementing new loaders and file validators

3eb07ae

Improve docstrings for LoaderProtocol and ValidFile classes

9a1ec1c

Restructure implementing loaders guide

a1315c4

Move loader guide to CONTRIBUTING.md

5e566e9

lochhh force-pushed the unify-loaders branch from afa3bfe to 5e566e9 Compare January 26, 2026 19:26

lochhh marked this pull request as ready for review January 27, 2026 11:13

lochhh requested a review from a team January 27, 2026 11:13

lochhh mentioned this pull request Jan 29, 2026

Add ROI save/load functionality via GeoJSON #773

Open

10 tasks

niksirbi requested changes Feb 9, 2026

View reviewed changes

Unify loaders #722

Are you sure you want to change the base?

Unify loaders #722

Uh oh!

Conversation

lochhh commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

sonarqubecloud bot commented Dec 11, 2025

Quality Gate passed

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

niksirbi commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 26, 2026

Quality Gate passed

Uh oh!

niksirbi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niksirbi Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

niksirbi Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

niksirbi Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

niksirbi Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

niksirbi Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

niksirbi Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lochhh commented Dec 11, 2025 •

edited

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading

niksirbi commented Dec 19, 2025 •

edited

Loading

niksirbi left a comment •

edited

Loading