[BUG] Fixes #301 and some other issues#354
Conversation
* in pygam.GAM._modelmat added selection of features, edge_knots and dtypes by term so only the necessary features are checked and features for factor terms can be set to 0 or 1 if another term is being checked * in pygam.GAM._flatten_mesh added two lines that set by feature to 1 if not None. This fixes issue of partial dependence of tensor terms being 0 everywhere if by feature is present * in pygam.GAM.partial_dependence removed check_X as right after it is checked in self._modelmat * updated docstring of pygam.GAM._linear_predictor * removed line that does nothing in terms.SplineTerm.build_column
|
looks good to me |
kdergachev
left a comment
There was a problem hiding this comment.
lgtm, but I see ruff complains about single quotes in pygam.py. Will fix in the next commit and there should be no more problems.
|
Seems like me deleting an empty line in gen_imgs.py broke something in tests for windows. I have no idea why that happened. Is that the reason it was left there in the first place? |
@kdergachev The windows tests are flaky. |
| modelmat : sparse matrix of len n_samples | ||
| containing model matrix of the spline basis for selected features | ||
| """ | ||
| # take features, dtypes and edge_knots based on supplied term values |
There was a problem hiding this comment.
OK i like this.
the point is that we only check the features that we need
There was a problem hiding this comment.
Yes, conveniently, check_X already allowed for that. Changing arguments was enough to fix the original issue.
Tests case where a factor term is added as a by feature to another term and its categories do not contain 1 (partial_dependence sets it to 1 so that it does not affect final result, but if check_X checks it an exeption will be thrown).
|
In the last commit added test for the same issue, but if it is caused by a factor term that does not have 1 in its domain that is at the same time a by feature in another term. Not entirely sure it is necessary, but that is what I meant in the first comment to this PR. |
This should fix the problem in #301 and some others that #302 might not, e.g. I believe it might break due to factor feature being set to 1.0 if it is a by feature in another term. Along the way found that by feature in tensor term is not set to 1 anywhere so partial dependence returns all 0. As _meshgrid seems to be written specifically for partial_dependence and generate_X_grid made it set by feature to 1 if not None.