Skip to content

ENH: Determination of result "flavor" when based on multiple input flavors #64005

@rhshadrach

Description

@rhshadrach

In various places, we may want to determine the dtype flavor* for newly created columns when based on multiple input flavors. A list of areas is at the bottom.

The implementation (a function in core.dtypes) could take an array, Series, or DataFrame and determine the proper flavor. If the input is a single flavor, then this is trivial. We'd need to decide what the rule is in case of multiple flavors.

Option 1: pyarrow -> nullable -> numpy (or any of the 5 other permutations as suboptions)

If there are any pyarrow then go with that; if no pyarrow and there are nullable then go with that, and finally only have numpy if all columns are numpy. I think we'd want to consider str dtype (whether backed by pyarrow or numpy) as being numpy for the purposes of this. This might be problematic when the data has only str columns, but I don't see a better alternative.

*flavor may not be the appropriate term here, I recall some objections to its usage in the past.

Areas
  • In pivot_table ref
  • In apply/agg/transform/map when a UDF is returning all scalars. Here, the input columns are to be used, which can in some cases be more than 1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions