[wip] Support arrow-backed pandas categorical columns.#11801
[wip] Support arrow-backed pandas categorical columns.#11801trivialfis wants to merge 3 commits intodmlc:masterfrom
Conversation
| c_typ = pa.DictionaryArray.from_arrays( | ||
| pa.array([0, 1, 2]), | ||
| pa.array(["cdef", "abc", "def"], type=pa.large_utf8()), | ||
| ) | ||
| c_ser = pd.Series(c_typ, dtype=pd.ArrowDtype(c_typ.type)) |
There was a problem hiding this comment.
When constructing c_ser from c_typ, I would suggest doing something like
c_array = pd.arrays.ArrowExtensionArray(c_typ)
c_ser = pd.Series(c_array)
There was a problem hiding this comment.
Thank you for pointing that out. I need to create a new test, this one doesn't work, as there's no cat attribute for arrow dictionary-backed columns (the existing test relies on this interface).
Yeah currently there isn't great support for Arrow-based categorical data in pandas. I think pandas should be able to hold the data and some common APIs across all types may be supported, but categorical specific APIs are not currently supported |
Thank you for sharing, noted. I will watch for pandas' future updates. |
Don't merge. I'm not sure if we need to support it at the moment, as arrow-backed data is still experimental in Pandas, and the documentation on how to create a categorical feature is limited.