Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 27, 2026

The _process_pandas_column function contained two elif isinstance(dt, pd.StringDtype): blocks - one at lines 1063-1081 and another at 1102-1149.

Changes

  • Removed first duplicate block (lines 1063-1081)
  • Simplified remaining block to use _process_ndarray consistently for both missing and non-missing value cases
  • Corrected variable usage (is_predictnot is_initial to match function signature)

The consolidated implementation handles StringDtype conversion to numpy strings with proper NA handling via .dropna() and .notna() when missing values are present.

Net impact: -48 lines, single code path for StringDtype processing


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement handling of pd.StringDtype Implement pd.StringDtype handling Jan 27, 2026
Copilot AI requested a review from paulbkoch January 27, 2026 09:16
@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.77%. Comparing base (766b663) to head (cf10959).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #647      +/-   ##
==========================================
+ Coverage   72.76%   72.77%   +0.01%     
==========================================
  Files          75       75              
  Lines       10447    10451       +4     
==========================================
+ Hits         7602     7606       +4     
  Misses       2845     2845              
Flag Coverage Δ
bdist_linux_310_python 72.46% <100.00%> (+0.07%) ⬆️
bdist_linux_311_python 72.41% <100.00%> (-0.06%) ⬇️
bdist_linux_312_python 72.48% <100.00%> (+0.12%) ⬆️
bdist_linux_313_python 72.41% <100.00%> (+0.02%) ⬆️
bdist_mac_310_python 72.60% <100.00%> (+0.04%) ⬆️
bdist_mac_311_python 72.65% <100.00%> (+0.10%) ⬆️
bdist_mac_312_python 72.65% <100.00%> (+0.09%) ⬆️
bdist_mac_313_python 72.63% <100.00%> (+0.02%) ⬆️
bdist_win_310_python 72.67% <100.00%> (+0.01%) ⬆️
bdist_win_311_python 72.69% <100.00%> (+0.02%) ⬆️
bdist_win_312_python 72.67% <100.00%> (+0.09%) ⬆️
bdist_win_313_python 72.60% <100.00%> (+0.01%) ⬆️
sdist_linux_310_python 72.44% <100.00%> (+0.09%) ⬆️
sdist_linux_311_python 72.42% <100.00%> (+0.09%) ⬆️
sdist_linux_312_python 72.32% <100.00%> (+<0.01%) ⬆️
sdist_linux_313_python 72.42% <100.00%> (+0.01%) ⬆️
sdist_mac_310_python 72.53% <100.00%> (-0.03%) ⬇️
sdist_mac_311_python 72.55% <100.00%> (+0.17%) ⬆️
sdist_mac_312_python 72.55% <100.00%> (-0.01%) ⬇️
sdist_mac_313_python 72.55% <100.00%> (+0.01%) ⬆️
sdist_win_310_python 72.67% <100.00%> (+0.09%) ⬆️
sdist_win_311_python 72.60% <100.00%> (+0.06%) ⬆️
sdist_win_312_python 72.67% <100.00%> (+0.07%) ⬆️
sdist_win_313_python 72.67% <100.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paulbkoch
Copy link
Collaborator

@copilot, please re-implement with the latest changes. I think the changes to handle pd.CategoricalDtype should help

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 4, 2026

@copilot, please re-implement with the latest changes. I think the changes to handle pd.CategoricalDtype should help

Re-implemented the pd.StringDtype handling in commit fdf23c3. The implementation follows the same pattern as object dtype and nullable integer/boolean types (Int64Dtype, BooleanDtype), using .hasnans, .dropna(), and .notna() for proper NA handling. The StringDtype values are converted to numpy string arrays and processed through _process_ndarray for categorical encoding.

Could you clarify if there are specific changes to the CategoricalDtype handling that I should incorporate? I don't see any recent commits beyond the grafted base that would indicate what changes you're referring to.

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Copilot AI changed the title Implement pd.StringDtype handling Optimize pd.StringDtype handling using pd.factorize Feb 4, 2026
Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>
Copilot AI changed the title Optimize pd.StringDtype handling using pd.factorize Remove duplicate pd.StringDtype handling in _process_pandas_column Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants