Remove duplicate pd.StringDtype handling in _process_pandas_column #647

Copilot · 2026-01-27T09:02:58Z

The _process_pandas_column function contained two elif isinstance(dt, pd.StringDtype): blocks - one at lines 1063-1081 and another at 1102-1149.

Changes

Removed first duplicate block (lines 1063-1081)
Simplified remaining block to use _process_ndarray consistently for both missing and non-missing value cases
Corrected variable usage (is_predict → not is_initial to match function signature)

The consolidated implementation handles StringDtype conversion to numpy strings with proper NA handling via .dropna() and .notna() when missing values are present.

Net impact: -48 lines, single code path for StringDtype processing

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

codecov · 2026-01-27T10:29:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.77%. Comparing base (766b663) to head (cf10959).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #647      +/-   ##
==========================================
+ Coverage   72.76%   72.77%   +0.01%     
==========================================
  Files          75       75              
  Lines       10447    10451       +4     
==========================================
+ Hits         7602     7606       +4     
  Misses       2845     2845

Flag	Coverage Δ
bdist_linux_310_python	`72.46% <100.00%> (+0.07%)`	⬆️
bdist_linux_311_python	`72.41% <100.00%> (-0.06%)`	⬇️
bdist_linux_312_python	`72.48% <100.00%> (+0.12%)`	⬆️
bdist_linux_313_python	`72.41% <100.00%> (+0.02%)`	⬆️
bdist_mac_310_python	`72.60% <100.00%> (+0.04%)`	⬆️
bdist_mac_311_python	`72.65% <100.00%> (+0.10%)`	⬆️
bdist_mac_312_python	`72.65% <100.00%> (+0.09%)`	⬆️
bdist_mac_313_python	`72.63% <100.00%> (+0.02%)`	⬆️
bdist_win_310_python	`72.67% <100.00%> (+0.01%)`	⬆️
bdist_win_311_python	`72.69% <100.00%> (+0.02%)`	⬆️
bdist_win_312_python	`72.67% <100.00%> (+0.09%)`	⬆️
bdist_win_313_python	`72.60% <100.00%> (+0.01%)`	⬆️
sdist_linux_310_python	`72.44% <100.00%> (+0.09%)`	⬆️
sdist_linux_311_python	`72.42% <100.00%> (+0.09%)`	⬆️
sdist_linux_312_python	`72.32% <100.00%> (+<0.01%)`	⬆️
sdist_linux_313_python	`72.42% <100.00%> (+0.01%)`	⬆️
sdist_mac_310_python	`72.53% <100.00%> (-0.03%)`	⬇️
sdist_mac_311_python	`72.55% <100.00%> (+0.17%)`	⬆️
sdist_mac_312_python	`72.55% <100.00%> (-0.01%)`	⬇️
sdist_mac_313_python	`72.55% <100.00%> (+0.01%)`	⬆️
sdist_win_310_python	`72.67% <100.00%> (+0.09%)`	⬆️
sdist_win_311_python	`72.60% <100.00%> (+0.06%)`	⬆️
sdist_win_312_python	`72.67% <100.00%> (+0.07%)`	⬆️
sdist_win_313_python	`72.67% <100.00%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

paulbkoch · 2026-02-04T04:15:14Z

@copilot, please re-implement with the latest changes. I think the changes to handle pd.CategoricalDtype should help

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

Copilot · 2026-02-04T04:20:09Z

@copilot, please re-implement with the latest changes. I think the changes to handle pd.CategoricalDtype should help

Re-implemented the pd.StringDtype handling in commit fdf23c3. The implementation follows the same pattern as object dtype and nullable integer/boolean types (Int64Dtype, BooleanDtype), using .hasnans, .dropna(), and .notna() for proper NA handling. The StringDtype values are converted to numpy string arrays and processed through _process_ndarray for categorical encoding.

Could you clarify if there are specific changes to the CategoricalDtype handling that I should incorporate? I don't see any recent commits beyond the grafted base that would indicate what changes you're referring to.

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

Initial plan

6018cf8

Copilot AI assigned Copilot and paulbkoch Jan 27, 2026

Copilot started work on behalf of paulbkoch January 27, 2026 09:03 View session

Implement pd.StringDtype handling in _clean_x.py and add test

cf10959

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement handling of pd.StringDtype~~ Implement pd.StringDtype handling Jan 27, 2026

Copilot AI requested a review from paulbkoch January 27, 2026 09:16

Copilot finished work on behalf of paulbkoch January 27, 2026 09:16

Copilot started work on behalf of paulbkoch February 4, 2026 04:15 View session

Re-implement pd.StringDtype handling following the established pattern

fdf23c3

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

Copilot finished work on behalf of paulbkoch February 4, 2026 04:21

Copilot started work on behalf of paulbkoch February 4, 2026 04:27 View session

Add optimized pd.StringDtype handling using pd.factorize

c6ae93d

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

Copilot AI changed the title ~~Implement pd.StringDtype handling~~ Optimize pd.StringDtype handling using pd.factorize Feb 4, 2026

Copilot finished work on behalf of paulbkoch February 4, 2026 04:36

Copilot started work on behalf of paulbkoch February 4, 2026 04:42 View session

Remove duplicate pd.StringDtype handling block

c6d25eb

Co-authored-by: paulbkoch <46825734+paulbkoch@users.noreply.github.com>

Copilot AI changed the title ~~Optimize pd.StringDtype handling using pd.factorize~~ Remove duplicate pd.StringDtype handling in _process_pandas_column Feb 4, 2026

Copilot finished work on behalf of paulbkoch February 4, 2026 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicate pd.StringDtype handling in _process_pandas_column #647

Remove duplicate pd.StringDtype handling in _process_pandas_column #647

Copilot AI commented Jan 27, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

paulbkoch commented Feb 4, 2026

Uh oh!

Copilot AI commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Remove duplicate pd.StringDtype handling in _process_pandas_column #647

Are you sure you want to change the base?

Remove duplicate pd.StringDtype handling in _process_pandas_column #647

Conversation

Copilot AI commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

codecov bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

paulbkoch commented Feb 4, 2026

Uh oh!

Copilot AI commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 27, 2026 •

edited

Loading

codecov bot commented Jan 27, 2026 •

edited

Loading