-
Notifications
You must be signed in to change notification settings - Fork 100
Unify loaders #722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Unify loaders #722
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #722 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 34 36 +2
Lines 2111 2170 +59
=========================================
+ Hits 2111 2170 +59 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I’ll just leave a quick thought here before I forget it over the holidays. I love the unified loader! This is going to become the first function most people use in Based on the draft PR, the new interface would be: from movement.io import load
ds = load.from_file("/path/to/file", source_software="DeepLabCut", fps=30)I’m wondering whether this is a good moment to simplify the import path even further. For example: from movement import load_tracks
ds = load_tracks("/path/to/file", source_software="DeepLabCut", fps=30)The module structure could stay as is; we would just expose this convenience function in Pros
Cons The syntax would diverge from what we currently have. If we adopt this, the high-level function ( I’m not sure how problematic that is: most new users would likely rely on the high-level function (and we can steer them toward it in the docs), while existing users can continue to use the lower-level functions if they prefer. UPDATE |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this heroic effort @lochhh! You've done it with your hallmark attention to detail. We definitely owe you a lot for undertaking this dry but extremely valuable work. The new section of the contributing guide is very clear now, and will be incredibly useful to us and to future contributors.
I have 3 main points (see inline comments for details and justifications), but I only consider the first one as 100% necessary.
- We should raise deprecation warnings for
load_poses.from_file()andload_bboxes.from_file()and keep them around for at least 1 more release (with redirection to the new unified loader). - In the Input/Output user guide, I'd suggest leading with the unified loader, and then expanding on software-specific loaders, adhering to the progressive disclosure principle.
- Now that I've seen the whole work with somewhat fresh eyes, I got the idea of naming the new unified loader as
load_dataset()instead offrom_file()(for reasons given in the corresponding inline comment). This would also imply renamingfrom_multiview_files()toload_multiview_dataset(). I'm not dead set on this (I see the merits of sticking to thefrom_*pattern), so happy to be overruled or persuaded otherwise.
The PR will also need to be rebased to main.
| return decorator | ||
|
|
||
|
|
||
| def from_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth re-considering the from_file name. from_file is accurate and consistent with the software-specific loaders, which I appreciate. But it describes the input, not what you get back. from_file is generic enough that it could return anything—it doesn't immediately signal this is the main entry point for loading motion tracking data. This was somewhat less of a problem with the previous general loaders, because, e.g., load_poses.from_file at least signals that we are loading poses.
I personally think load_dataset would be clearer here than from_file. It immediately tells you what you're getting, it reads naturally (from movement.io import load_dataset), and it aligns with xarray's open_dataset and our own sample_data.fetch_dataset. In the long-term, that consistency may matter more than matching the from_* pattern internally.
To clarify, I'm only 65-35 % in favour of load_dataset over from_file, and I'm happy to be over-ruled by majority, or persuaded to change my mind if there are aspects I haven't thought of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised my proposal for load_dataset has repercussion for from_multiview_files().
The two public functions from movement.io— from_file and from_multiview_files — form a consistent pair with the from_* pattern.
Renaming from_file to load_dataset would force us to also rename from_multiview_files to load_multiview_dataset (or load_dataset_multiview), adding to the deprecation burden. But I think that wouldn't be terrible (the new name makes sense, and I don't think from_multiview_files was much used.)
| return valid_bboxes_inputs.to_dataset() | ||
|
|
||
|
|
||
| def from_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fro this function, as well as for load_poses.from_file(), I would raise deprecation warnings, and re-direct to the new unified loader under the hood. I suspect that load_poses.from_file() and load_bbloxes.from_file() are some of our most used interfaces, so if there ever was a time to warn about the deprecation, it's now.
| return valid_poses_inputs.to_dataset() | ||
|
|
||
|
|
||
| def from_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this function for now, redirect it to the unified loader and raise a deprecation warning (see my comment on load_bboxes.from_file() for my reasons).
| FileExistsError | ||
| If the file exists when ``expected_permission`` is "w". | ||
| If the file exists when ``permission`` is "w". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a good time to revisit a question that has been bugging me. I'd originally implemented this to raise a FileExistsError if a file already exists when you are trying to save it. When I use movement as a user, this annoys me a lot, because I often re-run the same code snippet and hit this error every time after the first save. Most other libraries don't do this, i.e. they allow you to overwrite a file, so movement violates my implicit expectations.
What's your opinion on this? I'm also happy to raise this as a separate issue to discuss and solve independently of this PR.
| # --- Helper functions --- # | ||
|
|
||
|
|
||
| def validate_file_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is now a proper public function, what do you think of adding Parameters and Returns to its docstring? They're kind of obvious from the argument names, but that hasn't stopped us elsewhere.



Description
What is this PR
Why is this PR needed?
Closes #199 , part of #667
What does this PR do?
loadmodule that unifies the interface to loading different kinds of datasets (bboxes, poses)load.from_filereplacesload_poses.from_fileandload_bboxes.from_fileload.from_multiview_filesreplacesload_poses.from_multiview_filesand now supports both bboxes and poses@register_loaderdecorator, e.g.@register_loader("SLEAP", file_validators=[ValidSleapAnalysis, ValidSleapLabels], used tofrom_<source_software>_fileloader function in_LOADER_REGISTRY(essentially asource_software-loader mapping, used internally by load functions to dispatch to the respectivefrom_<source_software>_filein eitherload_bboxesorload_posesbased on the suppliedsource_softwarefrom_<source_software>_fileValidFile- for validating file permissions, suffix;Valid<source_software><file format>- for validating file contents, e.g. expected headers, columns)_file_validator(a validator composed of multiple validators to check for permissions, access, suffix, etc.),_hdf_validator(a validator that checks for expected dataset in an h5 file),_if_instance_of(only runvalidatorif it's an instance ofcls): these can be "stacked"/composed as a single validator for the requiredfileattribute in aValidFileclass, e.g.file = field(validator=validators.and_(A,B,C)or equivalentlyfile = field(validator=[A,B,C])@file.validatorload_poses.from_file,load_bboxes.from_file,load_poses.from_multiview_filesReferences
#199 #667
How has this PR been tested?
Tests have been added accordingly
Is this a breaking change?
Yes the functions
load_poses.from_file,load_bboxes.from_file,load_poses.from_multiview_fileshave been removed. I figured since we're still in v0 breaking changes are acceptable.Does this PR require an update to the documentation?
Added guide to adding new loaders in Contributing guide
Checklist: