Skip to content

Transpose ufunc parameters #107

@aazuspan

Description

@aazuspan

The current ufunc API roughly follows a simplified version of xr.apply_ufunc and dask.apply_gufunc, i.e. parameters are specified as lists and/or dicts that are mapped positionally to outputs. For example, a ufunc that returns three single-band outputs is defined as (using the new FeaturewiseUfunc from #103):

ufunc = FeaturewiseUfunc(
    calculate_ogsi,
    output_dims=[["band"], ["band"], ["band"]],
    output_dtypes=[np.float32, np.uint8, np.uint8],
    output_sizes={"band": 1},
    output_coords=[{"band": ["OGSI"]}, {"band": ["LTQ"]}, {"band": ["OGSI_CLS"]}],
)

While this is reasonably familiar for Xarray ufunc users, it requires reading positionally across parameters to understand each output (i.e. the first output is composed of the first output dimension, the first dtype, and the first set of output coordinates). It's also easy to specify incorrectly by nesting improperly or omitting an element, and difficult to type check.

@grovduck and I discussed a more intuitive API here which would transpose from parameters as groups of outputs to outputs as groups of parameters. For example, the same function could be defined using some new dataclasses as:

@dataclass
class Dimension:
    name: str
    size: int | None = None
    coords: list[str | int] | None = None

@dataclass
class Output:
    dims: list[Dimension]
    dtype: type[np.generic] | None = None
    

ufunc = FeaturewiseUfunc(
    calculate_ogsi,
    outputs=[
        Output(dims=[Dimension(name="band", size=1, coords=["OGSI"])], dtype=np.float32),
        Output(dims=[Dimension(name="band", size=1, coords=["LTQ"])], dtype=np.uint8),
        Output(dims=[Dimension(name="band", size=1, coords=["OGSI_CLS"])], dtype=np.uint8),
    ]
)

This change means that each output is defined independently, making them easier to validate, type, and reuse.

One drawback (aside from losing the familiarity of the Xarray API) is that Dask ufuncs don't support specifying a different size for the same dimension with multiple outputs, so this would require a posthoc validation for that case.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions