[ENH] add feature scaling support for EncoderDecoderDataModule#1983
[ENH] add feature scaling support for EncoderDecoderDataModule#1983PranavBhatP wants to merge 40 commits intosktime:mainfrom
EncoderDecoderDataModule#1983Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1983 +/- ##
=======================================
Coverage ? 86.63%
=======================================
Files ? 165
Lines ? 9919
Branches ? 0
=======================================
Hits ? 8593
Misses ? 1326
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Tests are failing on the notebook. Seems like its an issue caused by the training script in the notebook. Will try changing accelator = "cpu". Might resolve the issue. |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
I think we could look at v1 implementation? Maybe that could help us with this issue, how the |
|
Yes we will need a design doc to analyse v1 implementation. This is the best way forward. |
|
agree - it seems like one more rabbit hole... Is there perhaps something we can learn from |
|
I've replaced the loss function used in the notebook with |
fkiraly
left a comment
There was a problem hiding this comment.
May I request an explanation why the notebook is running prior to the changes but now failing?
I did not get your question. Are you referring to why the tests were failing priority to the changes in the most recent commits? It is passing now...For context, the tests were failing on the notebook because training simply does not seem to work when we use in-house PTF metrics. More context on the initial failing test - https://github.com/sktime/pytorch-forecasting/actions/runs/18570677570/job/52943552548 However, the pipeline works perfectly fine when we use |
If we want to be able to work with data that can not fit in memory the scalers |
|
@agobbifbk I say we merge with whatever we have (after review) and extend scaling via |
|
Ok! |
|
@fkiraly @phoeenniixx @agobbifbk would appreciate another review, this PR is complete from my side, do take a look whenever ur free. Thanks. |
There was a problem hiding this comment.
Thanks!
Please add the complete docstrings in the functions and also remove the todos that are met by this PR.
I also have some suggestions:
- I think we should also add the support of
GroupNormalizer,MultiNormalizerofptf(see the comment below). - I am a little worried about the
teststage ofdatamodule, here, if we are NOT using the same datamodule that we used for fitting, the code may break as fitted scalers etc are not present?
We should also think of saving the scalers, transformers in this case? I think this is wherepkglayer comes in? I think we need some discussion about this?
What do you think? Also FYI @fkiraly @agobbifbk
|
|
||
| elif stage == "test": | ||
| if not hasattr(self, "test_dataset"): | ||
| self._test_preprocessed = self._preprocess_all_data(self._test_indices) |
There was a problem hiding this comment.
If we are creating a new datamodule for testing (not using the dm that we used for fit), will this break as we dont have the fitted scalers etc here?
There was a problem hiding this comment.
Yes it will break, thanks for pointing that out, we need to figure out a way to persist the scalers..
There was a problem hiding this comment.
I have fixed it, it works as intended in the latest version, pls review it when possible.
Yes that is a valid concern, we need to globalise the transforms/scalers used. We need to figure out how to save the state also...across data modules. For now it throws a |
|
Looks like the lack of fitting will lead to failures..we have to add persistence before we proceed. |
@phoeenniixx Could you give a rough mental idea about how the package layer would work on this? I am trying to come up with something and a reference would be great! |
|
i have added saving and loading of scalers. The PR is complete with the respect to current requirements. |
|
Nice one! |
Maybe we need a torch version of Adding MinMaxScaler also makes perfect sense, it would be a good addition in a future PR (good first issue?) |
|
@phoeenniixx @fkiraly I would like a review. |
phoeenniixx
left a comment
There was a problem hiding this comment.
Thanks! I have added few suggestions related to loading and saving scalers.
| if self.datamodule is not None: | ||
| self.datamodule.time_series_dataset = data | ||
| self.datamodule.setup(stage="predict") | ||
| return self.datamodule.predict_dataloader() |
There was a problem hiding this comment.
Why aren't we using the dm from the next line (line 181) here? I think we could directly use that to setup predict stage? (See line 193)
There was a problem hiding this comment.
The reason for this (according to my intrepretation) - the dm in line 181 is a new data module which is created from scratch using the input data, which in this case is the data you want to generate predictions for. But if the datamodule has already been setup for training and has fitted scalers, there is no necessity to load scalers from a saved state, you simply set the dataset to the data you want to predict on, and call setup(stage="predict") on this, right? It also makes the process more efficient without having to create new dataloaders, everytime we reuse the same datamodule in predict stage.
But there is also a tiny concern - since we are replacing the training data entirely, so we wont have access to the old data. But I don't see how this would cause any issues.
I hope this logic seems alright, if there is anything wrong do let me know.
There was a problem hiding this comment.
Maybe instead of overriding the data completely, I could create a copy and then pass the scaler states to the new datamodule, without need for a pickle file.
| save_ckpt: bool = True, | ||
| ckpt_dir: str | Path = "checkpoints", | ||
| ckpt_kwargs: dict[str, Any] | None = None, | ||
| save_scalers: bool = True, |
There was a problem hiding this comment.
Should we expose this param? I mean if the user is checkpointing the model, they will always save the scalers, no? Are there any cases when the user only uses the models and not the scalers?
There was a problem hiding this comment.
I am not 100% sure about whether this param should be exposed, I was thinking maybe due to some reason the user might not want to save their scalers after training due to memory constraints or simply because they know that they are not going to be testing or using the scalers afterwards in a new session, hence won't require any kind of persistence?
|
|
||
| return windows | ||
|
|
||
| def save_scalers(self, path: str | Path): |
There was a problem hiding this comment.
Should we separate the concerns here?
- The datamodule just does the data loading and the preprocessing as intended
- The
pkglayer actually saves and load these scalers?
What I mean is the logic of saving and loading should be moved to pkg layer. The datamodule just returns the instances of the scalers and accepts the instances the scalers. The pkg layer saves and load the scalers and then send them to the datamodule as instances?
What do you think?
Reference Issues/PRs
fixes #1979
What does this implement/fix? Explain your changes.
Implements feature scaling for continuous targets and features in the D2 layer using input
scalersandtarget_normalizerparams inside theCurrent state of this PR:
preprocess_datato handle scaling of target and continuous features usingTorchNormalizer, ,EncoderNormalizer,StandardScalerandRobustScaler.EncoderNormalizerunlike the others, needs to be fit on every encoding sequence and transform the sequence "on-the-fly". The logic for fitting and normalizing here is separate and is done inside__getitem__when the D2 dataset feeds data into the model during training.Issues (EDIT: resolved):
TimeSeriesD1 object) into pandas/numpy data the computation graph is broken. Refer comment.UPDATE: THIS HAS BEEN FIXED. We move preprocessing to the datamodule level (during
setup()) instead of calling the transform inside the d2 dataset getitem. This ensures the we don't break the computation by completely moving the normalization pipeline away from the actual training stage, but perform it during the data module's setup stage.DRAWBACK: We cannot work with larger-than-memory data formats with this method. The trade-off here is loosing compatiblity with sklearn scalers.
What should a reviewer concentrate their feedback on?
PR checklist
pre-commit install.To run hooks independent of commit, execute
pre-commit run --all-files