Add generic Aggregator processor#462
Conversation
|
Yes, I will still add tests but wanted to hear any comments about feature-design first... |
|
As per bilateral discussion, I'd say this is a good way forward. |
|
Makes sense to me! |
bf621cb to
2e9a6c5
Compare
|
Ready for review @phackstock, I have a few questions from your pedantic wisdom: Apart from that, I'm not sure about a good way to include the file name in a validation-warning - but maybe we can also leave that for a future clean-up and standardization effort across the package, we are probably not 100% consistent. There are also a few overlaps with the region-processing feature, but I wanted to focus on this PR for now. I will also add a "rename"-config alternative to work with what @lisahligono implemented in IAMconsortium/common-definitions#272. I will add documentation in a follow-up PR once #470 is merged. |
phackstock
left a comment
There was a problem hiding this comment.
I would say that there's potential to simplify and shorten the code.
I have left some comments where exactly to do so in line.
| def validate_with_definition(self, dsd: DataStructureDefinition) -> None: | ||
| error = None | ||
| # check for codes that are not defined in the codelists | ||
| codelist = getattr(dsd, self.dimension, None) |
There was a problem hiding this comment.
I'd let the error occur right here if the dimension is not found:
| codelist = getattr(dsd, self.dimension, None) | |
| codelist = getattr(dsd, self.dimension) |
There was a problem hiding this comment.
I like the current implementation better because this way, I can show the name of the offending file directly in the error message, making it easier to debug in repositories that have many files.
There was a problem hiding this comment.
Fair point, but in this case the error does not really have anything to do with the aggregator processor but rather with the missing dimension in the DataStructueDefinition.
| return _codes | ||
|
|
||
| def validate_with_definition(self, dsd: DataStructureDefinition) -> None: | ||
| error = None |
There was a problem hiding this comment.
If you raise the error below directly, you don't need this
| error = None |
| if codelist is None: | ||
| error = f"Dimension '{self.dimension}' not found in DataStructureDefinition" | ||
| elif invalid := codelist.validate_items(self.codes): | ||
| error = ( |
There was a problem hiding this comment.
| error = ( | |
| raise ValueError(f"The following {self.dimension}s are not defined in the " | |
| "DataStructureDefinition:\n - " + "\n - ".join(invalid) + "\nin " + str(self.file) + "") | |
| f"The following {self.dimension}s are not defined in the " | ||
| "DataStructureDefinition:\n - " + "\n - ".join(invalid) | ||
| ) | ||
| if error: |
There was a problem hiding this comment.
If the error is raised directly, you don't need this anymore
| if error: |
| "DataStructureDefinition:\n - " + "\n - ".join(invalid) | ||
| ) | ||
| if error: | ||
| raise ValueError(error + "\nin " + str(self.file) + "") |
There was a problem hiding this comment.
| raise ValueError(error + "\nin " + str(self.file) + "") |
| raise ValueError(error + "\nin " + str(self.file) + "") | ||
|
|
||
| @classmethod | ||
| def from_file(cls, file: Path | str): |
There was a problem hiding this comment.
If you rename the mapping attribute to aggregate and use dict[str, list[str]] in favor of AggregationItem, I think you should be able to remove this function almost entirely.
There was a problem hiding this comment.
Tried that, but this would require ditching the AggregationItem class, I guess?
That might not be possible, I'll play around with the code a bit more to see. |
There was a problem hiding this comment.
Before I forget it, I'd suggest to rename to aggregator.py
|
Thanks for your comments @phackstock! I renamed file names and modules to aggregator and implemented some suggestions, but I'd push larger refactoring like using standardized error messages to a follow-up PR. |
phackstock
left a comment
There was a problem hiding this comment.
Thanks for incorporating the changes @danielhuppmann.
Agreed that we should push for a larger refactor and agreed that this PR is not the time and place for that.
So good to be merged from my side.
This PR is in relation to a request by @jkikstra for a CEDS-emissions-processor that renames and aggregates emissions-variables from IAM results to match the CEDS schema - see the related PR at IAMconsortium/common-definitions#255
The idea is to have a mapping like
that can then be called like
to apply the aggregation-mapping onto an IamDataFrame
df.The method uses the pyam
rename()method because that actually works (with summation) on all dimensions. This could be extended later for different aggregation-methods, etc.This could be useful for @lisahligono as part of the scenario-transfer-routine, where the rename-mapping for regions is implemented as an external yaml file and then just called like shown above as part of the processing-and-transfer.