Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
fe22254
Draft load module
lochhh Dec 1, 2025
51bf536
Make from_multiview_files general
lochhh Dec 2, 2025
0c9415d
Clarify behaviour of from_multiview_files
lochhh Dec 4, 2025
d0ed731
Cover one of poses and bboxes in test_from_multiview_files
lochhh Dec 4, 2025
8fd9775
Add loader registration with file validation
lochhh Dec 10, 2025
864e2de
Rename file_path to file
lochhh Jan 7, 2026
8e787c6
Draft file validators + register_loader refactor
lochhh Jan 12, 2026
75c0a2f
Remove unused dataset-specific from_file and from_multiview_files
lochhh Jan 13, 2026
4e072ab
Fix order of file checks
lochhh Jan 13, 2026
c9b4110
Update file validators tests and simplify file fixtures
lochhh Jan 13, 2026
de43bca
Add missing path converters
lochhh Jan 13, 2026
306af86
Import load modules to trigger decorator registration
lochhh Jan 13, 2026
52ab208
Use load.from_file in napari widget
lochhh Jan 13, 2026
d5ba660
Add file validator tests
lochhh Jan 13, 2026
294e592
Add from_file tests
lochhh Jan 13, 2026
bc774d6
Use TypeAlias for source_software type
lochhh Jan 20, 2026
0fad48e
Clean up and revert to `str | Path`
lochhh Jan 20, 2026
8e2e61e
Allow registering loaders without validators
lochhh Jan 20, 2026
aa3befb
Refactor `_file_is_accessible`
lochhh Jan 20, 2026
1817c15
Refactor `register_decorator`
lochhh Jan 20, 2026
169f4b1
Expose unified `load` functions in `movement.io`
lochhh Jan 22, 2026
2d8fa6e
Update ref to `from_multiview_files` in Datasets guide
lochhh Jan 22, 2026
bd6046d
Use consistent `valid_path` naming
lochhh Jan 22, 2026
2186a33
Add `from_file` section in IO guide
lochhh Jan 22, 2026
f649cee
Simplify suffixes check in file validator
lochhh Jan 22, 2026
bc55729
Include `instance` in `_if_instance_of` definition
lochhh Jan 22, 2026
8122841
Add guide for implementing new loaders and file validators
lochhh Jan 22, 2026
b70d873
Improve docstrings for LoaderProtocol and ValidFile classes
lochhh Jan 26, 2026
922c07b
Restructure implementing loaders guide
lochhh Jan 26, 2026
cf8da2b
Move loader guide to CONTRIBUTING.md
lochhh Jan 26, 2026
22e1910
Ensure consistent use of period in list items
lochhh Feb 10, 2026
48054bc
Log and raise error in validator example
lochhh Feb 10, 2026
89d94e1
Include VGG save in IO guide
lochhh Feb 10, 2026
320f939
Introduce from_file before software-specific functions
lochhh Feb 10, 2026
9e1e58d
Rename `from_file` to `load_dataset`
lochhh Feb 10, 2026
f0b2559
Rename `from_multiview_files` to `load_multiview_dataset`
lochhh Feb 10, 2026
0522742
Add docstring for `validate_file_path`
lochhh Feb 10, 2026
2588740
Revert "Remove unused dataset-specific from_file and from_multiview_f…
lochhh Feb 10, 2026
363a664
Update type casting in example
lochhh Feb 10, 2026
db028f8
Deprecate `from_file` and `from_multiview_files` with warnings
lochhh Feb 10, 2026
d6cda1e
Restore `from_file` and `from_multiview_files` tests
lochhh Feb 11, 2026
146f8da
Ignore deprecation warnings for `from_file`and `from_multiview_files`…
lochhh Feb 11, 2026
77c7596
Fix `from_multiview_files` docstring
lochhh Feb 11, 2026
dafddbb
Clarify load_dataset docstring
lochhh Feb 12, 2026
30fa87b
Standardise heading in contributing.md
lochhh Feb 12, 2026
baec90d
Add step for updating `load_dataset()` docstring with new loader
lochhh Feb 12, 2026
13e33c2
Mention load_poses and load_bboxes modules only
lochhh Feb 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions docs/source/community/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,205 @@ In general:
* Use {func}`warnings.warn` for user input issues that are non-critical and can be addressed within `movement`, e.g. deprecated function calls that are redirected, invalid `fps` number in {class}`ValidPosesInputs<movement.validators.datasets.ValidPosesInputs>` that is implicitly set to `None`; or when processing data containing excessive NaNs, which the user can potentially address using appropriate methods, e.g. {func}`interpolate_over_time()<movement.filtering.interpolate_over_time>`
* Use {meth}`logger.info()<loguru._logger.Logger.info>` for informational messages about expected behaviours that do not indicate problems, e.g. where default values are assigned to optional parameters.

### Implementing new loaders
Implementing a new loader to support additional [file formats](target-supported-formats) in `movement` involves the following steps:

1. Create validator classes for the file format (recommended).
2. Implement the loader function.
3. Update the `SourceSoftware` type alias.

#### Create file validators
`movement` enforces separation of concerns by decoupling file validation from data loading, so that loaders can focus solely on reading and parsing data, while validation logic is encapsulated in dedicated file validator classes.
Besides allowing users to get early feedback on file issues, this also makes it easier to reuse validation logic across different loaders that may support the same file format.

All file validators are [`attrs`](attrs:)-based classes and live in {mod}`movement.validators.files`.
They define the rules an input file must satisfy before it can be loaded, and they conform to the {class}`ValidFile<movement.validators.files.ValidFile>` protocol.
At minimum, this requires defining:

- `suffixes`: The expected file extensions for the format.
- `file`: The path to the file or an {class}`NWBFile<pynwb.file.NWBFile>` object, depending on the loader.

Additional attributes can also be defined to store pre-parsed information that the loader may need later.

Using a hypothetical format "MySoftware" that produces CSV files containing the columns `scorer`, `bodyparts`, and `coords`, we illustrate the full pattern file validators follow:

- Declare expected file suffixes.
- Normalise the input file and apply reusable validators.
- Implement custom, format-specific validation.

```python
@define
class ValidMySoftwareCSV:
"""Validator for MySoftware .csv output files."""
suffixes: ClassVar[set[str]] = {".csv"}
file: Path = field(
converter=Path,
validator=_file_validator(permission="r", suffixes=suffixes),
)
col_names: list[str] = field(init=False, factory=list)

@file.validator
def _file_contains_expected_header(self, attribute, value):
"""Ensure that the .csv file contains the expected header row.
"""
expected_cols = ["scorer", "bodyparts", "coords"]
with open(value) as f:
col_names = f.readline().split(",")[:3]
if col_names != expected_cols:
raise logger.error(
ValueError(
".csv header row does not match the known format for "
"MySoftware output files."
)
)
self.col_names = col_names
```

##### Declare expected file suffixes
The `suffixes` class variable restricts the validator to only accept files with the specified extensions.
If a suffix check is not required, this can be set to an empty set (`set()`).
In the `ValidMySoftwareCSV` example, only files with a `.csv` extension are accepted.

##### Normalise input file and apply reusable validators
An `attrs` {ref}`converter<attrs:converters>` is typically used to normalise input files into {class}`Path<pathlib.Path>` objects, along with one or more validators to ensure the file meets the expected criteria.

In addition to the built-in `attrs` {mod}`validators<attrs.validators>`, `movement` provides several reusable file-specific validators (as callables) in {mod}`movement.validators.files`:

- `_file_validator`: A composite validator that ensures `file` is a {class}`Path<pathlib.Path>`, is not a directory, is accessible with the required permission, and has one of the expected `suffixes` (if any).
- `_hdf5_validator`: Checks that an HDF5 `file` contains the expected dataset(s).
- `_if_instance_of`: Conditionally applies a validator only when `file` is an instance of a given class.

In the current example, the `_file_validator` is used to ensure that the input `file` is a readable CSV file.

:::{dropdown} Combining reusable validators
:color: success
:icon: light-bulb

Reusable validators can be combined using either {func}`attrs.validators.and_` or by passing a list of validators to the `validator` parameter of {func}`field()<attrs.field>`.
The `file` attribute in {class}`ValidDeepLabCutH5<movement.validators.files.ValidDeepLabCutH5>` combines both `_file_validator` and `_hdf5_validator` to ensure the input file is a readable HDF5 file containing the expected dataset `df_with_missing`:

```python
@define
class ValidDeepLabCutH5:
"""Class for validating DeepLabCut-style .h5 files."""

suffixes: ClassVar[set[str]] = {".h5"}
file: Path = field(
converter=Path,
validator=validators.and_(
_file_validator(permission="r", suffixes=suffixes),
_hdf5_validator(datasets={"df_with_missing"}),
),
)
```
:::

##### Implement format-specific validation
Most formats often require custom validation logic beyond basic file checks.
In the current example, the `_file_contains_expected_header` method uses the `file` attribute's validator method as a decorator (`@file.validator`) to check that the first line of the CSV file matches the expected header row for MySoftware output files.

:::{seealso}
- {external+attrs:std:doc}`examples`: Overview of writing `attrs` classes.
- {ref}`attrs Validators<attrs:validators>`: Details on writing custom validators for attributes.
:::

#### Implement loader function
Once the file validator is defined, the next step is to implement the loader function that reads the validated file and constructs the movement dataset.
Continuing from the hypothetical "MySoftware" example, the loader function `from_mysoftware_file` would look like this:

```python
@register_loader(
source_software="MySoftware",
file_validators=ValidMySoftwareCSV,
)
def from_mysoftware_file(file: str | Path) -> xr.Dataset:
"""Load data from MySoftware files."""
# The decorator returns an instance of ValidMySoftwareCSV
# which conforms to the ValidFile protocol
# so we need to let the type checker know this
valid_file = cast("ValidFile", file)
file_path = valid_file.file # Path
# The _parse_* functions are pseudocode
ds = load_poses.from_numpy(
position_array= _parse_positions(file_path),
confidence_array=_parse_confidences(file_path),
individual_names=_parse_individual_names(file_path),
keypoint_names=_parse_keypoint_names(file_path),
fps=_parse_fps(file_path),
source_software="MySoftware",
)
logger.info(f"Loaded poses from {file_path.name}")
return ds
```

Loader functions live in {mod}`movement.io.load_poses` or {mod}`movement.io.load_bboxes`, depending on the data type (poses or bounding boxes).

A loader function must conform to the {class}`LoaderProtocol<movement.io.load.LoaderProtocol>`, which requires the loader to:

- Accept `file` as its first parameter, which may be:
- A `str` or a {class}`Path<pathlib.Path>`.
- An {class}`NWBFile<pynwb.file.NWBFile>` object (for NWB-based formats).
- Return an {class}`xarray.Dataset<xarray.Dataset>` object containing the [movement dataset](target-poses-and-bboxes-dataset).

##### Decorate the loader with `@register_loader`
The {func}`@register_loader()<movement.io.load.register_loader>` decorator associates a loader function with a `source_software` name so that users can load files from that software via the unified {func}`load_dataset()<movement.io.load.load_dataset>` interface:
```python
from movement.io import load_dataset
ds = load_dataset("path/to/mysoftware_output.csv", source_software="MySoftware")
```

which is equivalent to calling the loader function directly:
```python
from movement.io.load_poses import from_mysoftware_file
ds = from_mysoftware_file("path/to/mysoftware_output.csv")
```

If a `file_validators` argument is supplied to the {func}`@register_loader()<movement.io.load.register_loader>` decorator, the decorator selects the appropriate validator&mdash;based on its declared `suffixes`&mdash;and uses it to normalise and validate the input `file` before invoking the loader.
As a result, the loader receives the validated file object instead of the raw path or handle.

If no validator is provided, the loader is passed the raw `file` argument as-is.

:::{dropdown} Handling multiple file formats for the same software
:color: success
:icon: light-bulb

Many software packages produce multiple file formats (e.g. DeepLabCut outputs both CSV and HDF5).
In that case, we recommend **one loader per source software**, which internally dispatches to per-format parsing functions, to ensure a consistent entry point for each supported source software.
If formats require very different validation logic, you may pass multiple validators to `file_validators=[...]`.
The decorator will select the appropriate validator based on file suffix and the validator's `suffixes` attribute.

```python
@register_loader(
source_software="MySoftware",
file_validators=[ValidMySoftwareCSV, ValidMySoftwareH5],
)
def from_mysoftware_file(file: str | Path) -> xr.Dataset:
"""Load data from MySoftware files (CSV or HDF5)."""
...
```
:::

##### Construct the dataset
After parsing the input file, the loader function should construct the movement dataset using:

- {func}`movement.io.load_poses.from_numpy` for pose tracks.
- {func}`movement.io.load_bboxes.from_numpy` for bounding box tracks.

These helper functions create the {class}`xarray.Dataset<xarray.Dataset>` object from numpy arrays and metadata, ensuring that the dataset conforms to the [movement dataset specification](target-poses-and-bboxes-dataset).

#### Update SourceSoftware type alias
The `SourceSoftware` type alias is defined in {mod}`movement.io.load` as a `Literal` containing all supported source software names.
When adding a new loader, update this type alias to include the new software name to maintain type safety across the codebase:

```python
SourceSoftware: TypeAlias = Literal[
"DeepLabCut",
"SLEAP",
...,
"MySoftware", # Newly added software
]
```

### Continuous integration
All pushes and pull requests will be built by [GitHub actions](github-docs:actions).
This will usually include linting, testing and deployment.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@
"anipose": "https://anipose.readthedocs.io/en/latest/",
"TRex": "https://trex.run/docs/",
"uv": "https://docs.astral.sh/uv/{{path}}#{{fragment}}",
"attrs": "https://www.attrs.org/en/stable/{{path}}#{{fragment}}",
"pytest-benchmark": "https://pytest-benchmark.readthedocs.io/en/latest/{{path}}#{{fragment}}",
}

Expand All @@ -261,6 +262,7 @@
"pynwb": ("https://pynwb.readthedocs.io/en/stable/", None),
"matplotlib": ("https://matplotlib.org/stable/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"attrs": ("https://www.attrs.org/en/stable/", None),
}

# What to show on the 404 page
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user_guide/gui.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,10 +155,10 @@ Below is a small example showing how to save a GUI-compatible
netCDF file with `movement`:

```python
from movement.io import load_poses
from movement.io import load_dataset
from movement.filtering import rolling_filter

ds_orig = load_poses.from_file(
ds_orig = load_dataset(
"path/to/my_data.h5", source_software="DeepLabCut", fps=30
)

Expand Down
Loading