Skip to content

Add generic Aggregator processor#462

Merged
danielhuppmann merged 15 commits intoIAMconsortium:mainfrom
danielhuppmann:feature/generic-aggregator
Feb 19, 2025
Merged

Add generic Aggregator processor#462
danielhuppmann merged 15 commits intoIAMconsortium:mainfrom
danielhuppmann:feature/generic-aggregator

Conversation

@danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Jan 22, 2025

This PR is in relation to a request by @jkikstra for a CEDS-emissions-processor that renames and aggregates emissions-variables from IAM results to match the CEDS schema - see the related PR at IAMconsortium/common-definitions#255

The idea is to have a mapping like

- dimension: variable
- aggregate:
  - Target Code:
      - Variable A
      - Variable B

that can then be called like

ceds_mapping = AggregationMapping.from_yaml("ceds-mapping/ceds-variable-mapping.yaml")
ceds_aggregator = Aggregator(mapping=ceds_mapping)

ceds_aggregator.apply(df)

to apply the aggregation-mapping onto an IamDataFrame df.

The method uses the pyam rename() method because that actually works (with summation) on all dimensions. This could be extended later for different aggregation-methods, etc.

This could be useful for @lisahligono as part of the scenario-transfer-routine, where the rename-mapping for regions is implemented as an external yaml file and then just called like shown above as part of the processing-and-transfer.

@danielhuppmann
Copy link
Member Author

Yes, I will still add tests but wanted to hear any comments about feature-design first...

@phackstock
Copy link
Contributor

As per bilateral discussion, I'd say this is a good way forward.
I'd include the dimension attribute in the yaml file.

@dc-almeida
Copy link
Collaborator

Makes sense to me!

@danielhuppmann danielhuppmann force-pushed the feature/generic-aggregator branch from bf621cb to 2e9a6c5 Compare February 7, 2025 08:19
@danielhuppmann danielhuppmann marked this pull request as ready for review February 11, 2025 07:14
@danielhuppmann
Copy link
Member Author

Ready for review @phackstock, I have a few questions from your pedantic wisdom:
I defined an attribue codes for the validate_with_definition() method, but it seems I cannot use codes in the field-validators (even with after-mode)?

Apart from that, I'm not sure about a good way to include the file name in a validation-warning - but maybe we can also leave that for a future clean-up and standardization effort across the package, we are probably not 100% consistent. There are also a few overlaps with the region-processing feature, but I wanted to focus on this PR for now.

I will also add a "rename"-config alternative to work with what @lisahligono implemented in IAMconsortium/common-definitions#272.

I will add documentation in a follow-up PR once #470 is merged.

Copy link
Contributor

@phackstock phackstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that there's potential to simplify and shorten the code.
I have left some comments where exactly to do so in line.

def validate_with_definition(self, dsd: DataStructureDefinition) -> None:
error = None
# check for codes that are not defined in the codelists
codelist = getattr(dsd, self.dimension, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd let the error occur right here if the dimension is not found:

Suggested change
codelist = getattr(dsd, self.dimension, None)
codelist = getattr(dsd, self.dimension)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the current implementation better because this way, I can show the name of the offending file directly in the error message, making it easier to debug in repositories that have many files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but in this case the error does not really have anything to do with the aggregator processor but rather with the missing dimension in the DataStructueDefinition.

return _codes

def validate_with_definition(self, dsd: DataStructureDefinition) -> None:
error = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you raise the error below directly, you don't need this

Suggested change
error = None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

if codelist is None:
error = f"Dimension '{self.dimension}' not found in DataStructureDefinition"
elif invalid := codelist.validate_items(self.codes):
error = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error = (
raise ValueError(f"The following {self.dimension}s are not defined in the "
"DataStructureDefinition:\n - " + "\n - ".join(invalid) + "\nin " + str(self.file) + "")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

f"The following {self.dimension}s are not defined in the "
"DataStructureDefinition:\n - " + "\n - ".join(invalid)
)
if error:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error is raised directly, you don't need this anymore

Suggested change
if error:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

"DataStructureDefinition:\n - " + "\n - ".join(invalid)
)
if error:
raise ValueError(error + "\nin " + str(self.file) + "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(error + "\nin " + str(self.file) + "")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

raise ValueError(error + "\nin " + str(self.file) + "")

@classmethod
def from_file(cls, file: Path | str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rename the mapping attribute to aggregate and use dict[str, list[str]] in favor of AggregationItem, I think you should be able to remove this function almost entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that, but this would require ditching the AggregationItem class, I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@phackstock
Copy link
Contributor

I defined an attribue codes for the validate_with_definition() method, but it seems I cannot use codes in the field-validators (even with after-mode)?

That might not be possible, I'll play around with the code a bit more to see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I forget it, I'd suggest to rename to aggregator.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done!

@danielhuppmann
Copy link
Member Author

Thanks for your comments @phackstock!

I renamed file names and modules to aggregator and implemented some suggestions, but I'd push larger refactoring like using standardized error messages to a follow-up PR.

@phackstock phackstock self-requested a review February 17, 2025 11:13
Copy link
Contributor

@phackstock phackstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for incorporating the changes @danielhuppmann.
Agreed that we should push for a larger refactor and agreed that this PR is not the time and place for that.
So good to be merged from my side.

@danielhuppmann danielhuppmann merged commit 37979d6 into IAMconsortium:main Feb 19, 2025
11 checks passed
@danielhuppmann danielhuppmann deleted the feature/generic-aggregator branch February 19, 2025 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants