-
-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Description
In various places, we may want to determine the dtype flavor* for newly created columns when based on multiple input flavors. A list of areas is at the bottom.
The implementation (a function in core.dtypes) could take an array, Series, or DataFrame and determine the proper flavor. If the input is a single flavor, then this is trivial. We'd need to decide what the rule is in case of multiple flavors.
Option 1: pyarrow -> nullable -> numpy (or any of the 5 other permutations as suboptions)
If there are any pyarrow then go with that; if no pyarrow and there are nullable then go with that, and finally only have numpy if all columns are numpy. I think we'd want to consider str dtype (whether backed by pyarrow or numpy) as being numpy for the purposes of this. This might be problematic when the data has only str columns, but I don't see a better alternative.
*flavor may not be the appropriate term here, I recall some objections to its usage in the past.
Areas
- In
pivot_tableref - In apply/agg/transform/map when a UDF is returning all scalars. Here, the input columns are to be used, which can in some cases be more than 1.