feat: Extend .frame() API with format and query parameters#2322
feat: Extend .frame() API with format and query parameters#2322Premkumar-2004 wants to merge 6 commits intoprobabl-ai:mainfrom
Conversation
f04fdac to
e9b7970
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
skore/tests/unit/displays/coefficients/test_cross_validation.py
Outdated
Show resolved
Hide resolved
skore/tests/unit/displays/coefficients/test_cross_validation.py
Outdated
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
I would prefer that we exclude the Regarding the If we support this case (I would need to test to be sure), then I would prefer to have a |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
glemaitre
left a comment
There was a problem hiding this comment.
So a couple of thoughts:
- we will strive to not have multi-index in the return frame because it is complex to index
- if we want to have be able to distinguish split/label/output/estimator, we need to prepend some names with what data it is.
- for the comparison, we should further check if the features for each model have the same name (we should have a
_has_same_featuresmethod) and not allow to go further if it is not the case.
So now, my question would be if we really allow wide format when we have more than a single column in the pivot_table because we will end-up with complex name.
@GaetandeCast What do you think.
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
skore/src/skore/_sklearn/_plot/feature_importance/coefficients.py
Outdated
Show resolved
Hide resolved
|
I agree with the points you make @glemaitre and the solutions you propose. To answer your question, the solution might be to use an One could also make a point that in every case, the wide format is easier (but not necessarily easy) to read than long and therefore should be used by default. Let me know what you think. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
I've completed the changes you requested (removing PR #2346 renamed f Could you please advise on how to proceed? |
|
@Premkumar-2004 I've solved merge conflicts for you. However there still are some unfinished discussions here. Please refrain from making too many changes and asking for reviews until we are settled on those. Let's tackle:
|
|
For the wide format issues: |
|
From a discussion with @GaetandeCast : The most complex case we want to support is a ComparisonReport of CrossValidationReport: from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from skore import train_test_split
from skore import ComparisonReport, CrossValidationReport
X, y = load_iris(return_X_y=True)
split_data = train_test_split(X=X, y=y, random_state=42, as_dict=True)
estimator_1 = LogisticRegression(max_iter=10000, random_state=42)
estimator_2 = LogisticRegression(max_iter=10000, random_state=43)
comparison_report = ComparisonReport(
[CrossValidationReport(estimator_1, X, y), CrossValidationReport(estimator_2, X, y)]
)
comparison_report.inspection.coefficients().frame()with the output as of right now: That table is hard to read because it's so wide, and also hard to query because the columns are a MultiIndex. Some thoughts:
More generally, we need to decide what
I think, along with @GaetandeCast, that we should go for readability. The customizability need is already addressed by So specifically for the Comparison[CV] case, it could look something like There is a case to be made that the As for the default value for Later, the @Premkumar-2004 Thanks for your work on this; I think it would be a bit faster to push on this PR to bring it to the finish line. Hope that's okay with you. Again your work is very much appreciated. |
|
Sure, go ahead! Happy to help and thanks for the guidance on this. Let me know if you need anything else from my side. |
Fix linting issues feat(api): add format, aggregate, and query parameters to CoefficientsDisplay.frame() Merge branch 'main' into issue-2161 feat: Extend .frame() API with format and query parameters feat: Extend .frame() API with format and query parameters Merge branch 'main' into issue-2161 feat: Extend .frame() API with format and query parameters Merge branch 'main' into issue-2161 feat: Extend .frame() API with format and query parameters feat: Extend .frame() API with format and query parameters Merge branch 'main' into issue-2161 Merge branch 'main' into issue-2161 feat: Extend .frame() API with format and query parameters feat: Extend .frame() API with format and query parameters Merge branch 'main' into issue-2161 feat: Extend .frame() API with format and query parameters Merge branch 'main' into issue-2161 feat: Extend .frame() API with format and query parameters Merge branch 'main' into HEAD fix merge fix doctest merge
521278e to
402f219
Compare
|
Caution Some commits in the pull request are not signed, or GitHub is not able to verify the signature. |
Extended
CoefficientsDisplay.frame()with three new parameters:format ("long" | "wide" | "auto")- Controls output shapelong(default): Original format, one row per coefficientwide: Features as rows, labels/splits as columnsauto: Wide for EstimatorReport/CV, long for ComparisonReportaggregate (bool)- For CV reports, showsmean ± stdinstead of individual splitsquery (dict)- Filter by column values, e.g.,{"label": "setosa"}