-
Notifications
You must be signed in to change notification settings - Fork 774
Remove duplicate pd.StringDtype handling in _process_pandas_column #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #647 +/- ##
==========================================
+ Coverage 72.76% 72.77% +0.01%
==========================================
Files 75 75
Lines 10447 10451 +4
==========================================
+ Hits 7602 7606 +4
Misses 2845 2845
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@copilot, please re-implement with the latest changes. I think the changes to handle pd.CategoricalDtype should help |
Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Re-implemented the pd.StringDtype handling in commit fdf23c3. The implementation follows the same pattern as object dtype and nullable integer/boolean types (Int64Dtype, BooleanDtype), using Could you clarify if there are specific changes to the CategoricalDtype handling that I should incorporate? I don't see any recent commits beyond the grafted base that would indicate what changes you're referring to. |
Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
The
_process_pandas_columnfunction contained twoelif isinstance(dt, pd.StringDtype):blocks - one at lines 1063-1081 and another at 1102-1149.Changes
_process_ndarrayconsistently for both missing and non-missing value casesis_predict→not is_initialto match function signature)The consolidated implementation handles StringDtype conversion to numpy strings with proper NA handling via
.dropna()and.notna()when missing values are present.Net impact: -48 lines, single code path for StringDtype processing
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.