Consider removing `ensure_min_samples` and directly returning fully masked arrays

## Context

When applying ufuncs to chunked feature arrays with `skip_nodata=True`, you could easily end up calling the ufunc with an empty array if a chunk contains only NoData. That's a problem for `sklearn` estimator methods that [validate their inputs](https://github.com/scikit-learn/scikit-learn/blob/3c5f668eb1131499e3db2fc50c1f99ee0b670756/sklearn/utils/validation.py#L738) with `ensure_min_samples=1` by default and fail with empty inputs. To avoid that issue, we added a corresponding `ensure_min_samples` parameter that temporarily fills input arrays with dummy values up to `ensure_min_samples` before passing them to the ufunc. That works, but the logic is somewhat complex, requires a handful of different validation checks, and is currently constrained to reshaped 2D data. 

## Proposed change

While `ensure_min_samples` supports arbitrary minimum samples, the only practical usage that I'm aware of is `ensure_min_samples=1` (i.e. don't pass empty arrays), and that case *could* be handled with a simpler, faster approach of skipping the ufunc call entirely and just returning a fully masked output array. To do that, we would need to use `output_sizes`, `output_dims`, `output_dtypes`, and `nodata_output` to construct a correctly shaped and filled output (currently we use the partial result of the ufunc call to construct the output array), but I *think* that should be feasible. 

Since the current solution works, this is probably only worth pursuing if there's a meaningful performance improvement and/or the current 2D dimensionality constraint becomes a limitation elsewhere, *and* if we can't think of other cases where a ufunc would require >1 sample (although you could always set `skip_nodata=False` as a last resort).

@grovduck, let me know if you see any limitations I'm not thinking of if we were to ditch `ensure_min_samples` in favor of this approach. This should probably be pretty low priority since the current solution works, but I wanted to record it as a possibility while I was thinking about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider removing `ensure_min_samples` and directly returning fully masked arrays #104

Context

Proposed change

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider removing ensure_min_samples and directly returning fully masked arrays #104

Description

Context

Proposed change

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider removing `ensure_min_samples` and directly returning fully masked arrays #104