Skip to content

Releases: singularity-energy/open-grid-emissions

v0.7.1

22 Jan 18:01
f387a4c

Choose a tag to compare

This patch release fixes an issue caused by the renaming of a column in the CEMS parquet file in pudl where OGE reads data from. Specifically, PUDL renamed the steam_load_1000_lbs column to steam_load_lbs, which broke our data loading functions.

What's Changed

Full Changelog: v0.7.0...v0.7.1

v0.7.0

23 Dec 18:37
9bfd60e

Choose a tag to compare

This is a new major release that includes new data for 2024 and several enhancements compared to v0.6.1

2024 Data Release

OGE now includes data for 2024, based on the final release data from EIA (Forms 860, 923, and 930) and EPA (Continuous Emissions Monitoring System data. Along with new 2024 data, the existing 2005-2023 OGE data has been updated with the latest methodological improvements.

Improvements

EIA-930 data

In previous years, we utilized raw data downloads from EIA Form 930 (Hourly and Daily Balancing Authority Operations Report) in our pipeline, but this data source has now been integrated into PUDL, so we have switched to use PUDL's version of this data. As for our other data inputs, we rely on PUDL for cleaning and organizing EIA-930 data into well-modeled tables that facilitates the downstream analysis.

As you may know, EIA recently updated its Form 930 data to add new, more detailed fuel/generation categories, especially for renewables and storage, distinguishing between battery storage, solar (with/without integrated battery), wind (with/without integrated battery), geothermal, pumped storage (separate from hydro), and other storage, to better track hourly grid integration of diverse resources, providing crucial data for grid management and analysis. However, to date, only certain balancing areas have started using these new fuel categories. For this reason, in this release we map the new fuel types to the existing ones.

Enhancements for non-local data inputs

For projects that use oge as a dependency or use functions relying on PUDL data, it was previously necessary to download a local version of PUDL's (multi-GB) sqlite database since dataframes cannot be directly read from remote sqlite databases. However, PUDL has now made parquet versions of its tables available, which means that these can now be read directly from the cloud without having to download a local version. This version of OGE now includes options to read these input files from the cloud (#411)

Fix misallocation of generation and fuel to individual generators

Our data pipeline relies on a process to allocate generation and fuel data reported in EIA-923 to individual generators at each plant. We discovered and fixed a bug that affected plants with generators retiring or coming online in the report year that was resulting in misallocations of generation and fuel to individual generators at a plant. See catalyst-cooperative/pudl#4789 for more details (note: we currently use a forked version of this code to run this pipeline, so while this fix has not yet been merged in pudl, it has been fixed in our fork).

Consumed emission calculation enhancements

In addition to improving the data cleaning of the EIA-930 data that is used as an input to the consumed emissions calculation (#430), we also made a small update to the methodology used to calculate monthly and annual consumed emissions rates. Previously, we had used implied demand (generation minus interchange) for weighting the hourly emission rates when calculating monthly and annual aggregations. However, this approach led to higher occurrences of missing data. With this release, we now use the directly reported demand data for each BA from EIA-930 (#422)

Expanded subplant crosswalk

We had previously not created subplant IDs for proposed generators that were not far along enough in construction. However, we have found ourselves interacting with more data that requires information about these generators, so we decided to expand our subplant crosswalk to include more proposed generators (#428). While we currently use a separate pipeline from PUDL for assigning subplant IDs, we hope to harmonize these processes in the future and rely on PUDL's subplant IDs in a future release (catalyst-cooperative/pudl#3691)

Optimized memory use of data pipeline

Each new generator that gets added to the grid increases the amount of hourly data that we work with each year. We found that we were having trouble running the full OGE pipeline without running into memory (RAM) errors for more recent years on certain computers, so refactored some of our code to use memory more efficiently (#419). One larger change we implemented was to drop data for hours when generators were not operating (#432). We found that observations where all operational data (fuel consumption, generation, emissions) were zero accounted for over 2.5GB of data in our pipeline! Removing this data required some additional downstream changes to ensure data completeness in our outputs.

What's Changed

Full Changelog: v0.6.1...v0.7.0

v0.6.1

26 Sep 22:46
71562f9

Choose a tag to compare

This is a patch release which fixes circular imports between several of the modules, and updates package dependencies to address security warnings.

What's Changed

Full Changelog: v0.6.0...v0.6.1

v0.6.0

24 Dec 19:32
6820554

Choose a tag to compare

v0.6.0 of OGE includes new data for 2023, a major methodological update, and various other enhancements and bug fixes.

2023 Data Release and Early Release Capability.

OGE now includes data for 2023, based on the final release data from EIA and EPA.
In addition, OGE now includes functionality to be able to ingest "Early Release" data from the EIA, which is typically available several months prior to the final release data released each autumn.

Aggregating Subplant data rather then Plant data

OGE includes data both at the "plant" level, as well as at the "fleet" and region level (see our documentation for more on these aggregations). While all emissions calculations were happening at the generator or "subplant" level, we had previously aggregated subplant data to the plant level, and then used the aggregated plant-level data to further aggregate to the fleet and region level. While this made these latter aggregations more computationally feasible, this could result in some irregularities and inconsistencies in the fleet and region data when a plant burned multiple fuels. Instead, we now use subplant data as the basis of all fleet-level aggregations as well.

Consider the example of the now-retired Meramec plant (ID 2104) in Missouri, which had 2 natural gas steam turbines and 2 conventional coal boilers:

  • In 2022, its final year of operation, this plant burned slightly more natural gas than coal (by heat content), so it was categorized as a natural gas plant.
  • Previously, since we were determining fleets based on plant data, the emissions from the entire plant (including the 2 coal generators) would have been aggregated into the natural gas fleet. However, this means that the average emissions for the natural gas fleet in this region would include some coal emissions, and thus be higher than typical natural gas fleet emissions.
  • Now that we use subplants as the basis for the fleet aggregations, the two natural gas generators at Meramec are aggregated to the natural gas fleet, and the two coal generators at Meramec are aggregated to the coal fleet.
  • This new approach more closely matches, in our understanding, how balancing authorities generally aggregate fleet data, using generators as the basis for these aggregations rather than plants.

In addition to affecting the fleet totals, this change also affects the hourly profile imputation process, since the residual hourly profiles will now be determined based on the updated fleet definitions.

Now that subplant data is being used more extensively through the pipeline, OGE also contains two new data outputs:

  • Subplant-level results data at the annual and monthly resolutions (in addition to the existing plant-level output data)
  • Subplant-specific attributes table that lists the primary fuel, nameplate capacity, and primary prime mover for each subplant.

For more details on these changes, see: #395

Exapanded and enhanced EPA-EIA crosswalking

The subplant-level aggregation revealed a number of previously-uncaught issues with our existing mapping between EPA plant/unit IDs and EIA plant/generator IDs:

  1. The EPA-EIA mapping is not static over time: the relationship between an EPA ID and EIA ID can change from one year to the next, sometimes changing multiple times over the nearly 20-year historical period covered by OGE. In fact some mappings change one year, and then change back to the original mapping several years later! To address this, OGE now includes a "start year" and "end year" for each mapping, and only uses the mapping that is valid for the current year
  2. The existing power sector data crosswalk published by the EPA is missing a number of newer mappings (since 2018), as well as many mappings for earlier years in the 2000s. We were able to expand these mappings using data that already exists in CAMPD's facility database

Ultimately, this update includes about 350 new mappings between EPA and EIA IDs. Without these mappings, the generation and emissions from a subplant could be double-counted if the unit reports data to both the EPA and EIA, since these would have been previously identified as separate subplants.

We also found that across various EPA datasets, that units with IDs starting with leading 0s (e.g., "001") were inconsistently having those leading zeros removed, resulting in sometimes incomplete matches between datasets. To address this, we now strip all leading zeros from EPA unit IDs to ensure consistent mapping.

Data usability enhancements

In the annual, plant-level results file, we now include plant attributes (such as name, location, capacity, fuel, etc) to make these files easier to use and filter in Excel rather than needing to work with them programmatically.

Other Improvements

The subplant-level aggregations also revealed that a number of subplants only include steam output data from CEMS, but no generation data. Examining these units revealed that these boilers may only be used for steam production (for district steam systems for example) and not power production, so these are once again being dropped from the dataset until we can get further clarification from EPA on how to interpret this data.

What's Changed

Full Changelog: v0.5.0...v0.6.0

v0.5.0

01 Aug 22:18
ea2bf19

Choose a tag to compare

OGE v0.5.0 is a new major release that expands the dataset's historical coverage back to 2005, and includes other methodological enhancements that improve data quality in all years.

In addition to the new data, users should expect changes to the existing 2019-2022 data: NOx and SO2 totals may change for some plants, net generation totals may change for some plants, data may change for CHP plants (see the "methodological updates" section for more details)

Input data changes

  • Updates to use the most recent data version of PUDL (v2024.5.0). This includes a re-release of the 2022 EIA-923 data, which may change some of the 2022 results.
  • Updates reference tables including the energy_source_groups file, and the utility_name_ba_code_map file (#374), and epa_eia_crosswalk_manual (#372), and emission_factors_for_co2_ch4_n2o (#377)

Output data changes

  • Expands historical coverage of OGE to include monthly and annual data for 2005-2018 (#295 and #362)
  • All output files (those in the outputs/ directory are now saved as compressed .csv.zip files instead of .csv files. This reduces the disk space of the outputs folder from approximately 16GB to 2.5GB. (#366)
  • Expands the data in the plant_static_attributes table to include location data (lat/long, address) and nameplate capacity (#364, #382, #385); commercial operation dates and retirement dates (#367). We also screen for and correct erroneous lat/long data (#368)
  • Fixes a bug where the "total" values in the outputs/annual_generation_averages_by_fuel file were not being calculated correctly

Methodological updates

  • When calculating the electric allocation factors for combined heat and power (CHP) plants, we previously were calculating this at the generator level, which was introducing bugs for certain combined cycle units when fuel and generation is reported for different generators at the same subplant. We now calculate this factor at the subplant level (#363)
  • Fixes several bugs with the gross-to-net generation conversions where anomalous fleet-average ratios were being introduced, and default factors were not being mapped to certain generators. Also fixed a bug where GTN ratios were being calculated where there was missing gross generation or net generation data. (#370, #375, #383)
  • Updates uncontrolled NOx and SO2 factors to align assumptions with those used by the EIA Electric Power Annual, and to fix a bug where we were adjusting the SO2 values for fluidized bed boilers, even though the control efficiencies are already incorporated into the uncontrolled emission factors (#373). In addition, because fuel sulfur content data is not available pre-2008, we use sulfur content values averaged from 2008-2012 to backfill the missing data. When calculating backstop values for missing values in any year, we now use state-specific values (rather than national-average) to reflect differences in the sulfur contents of fuels being delivered in specific parts of the country (#376)

Other minor fixes

  • Remove the option to run the EIA-923 allocation at the plant level. This was an artifact that was no longer used (#361)
  • Clean up function typehints and continue converting docstrings to Google format
  • Updates where files are stored and accessed from in s3 (#384)

Pull Requests in this update

  • Expand historical coverage pre-2019 by @grgmiller in #295
  • Remove add_subplant_id optional argument by @rouille in #361
  • Add 2005, 2006 and 2007 years by @rouille in #362
  • Calculate electric_allocation_factor by subplant by @grgmiller in #363
  • Compress OGE Outputs by @grgmiller in #366
  • Add geographical information to the plant static attributes data frame by @rouille in #364
  • Add operating and retirement dates to plant static attributes by @rouille in #367
  • Update to use most recent version of pudl by @grgmiller in #369
  • Fix issues with anomalous gross to net conversions by @grgmiller in #370
  • Fix and add information to plant static attributes by @rouille in #368
  • Fix function calculating averages of the fuel types by @rouille in #371
  • Update manual epa eia crosswalk reference table by @rouille in #372
  • Update Uncontrolled NOx and SO2 factors by @grgmiller in #373
  • Update Energy Source Codes and Utility Name Map by @grgmiller in #374
  • Correct Gross to Net Generation Bugs by @grgmiller in #375
  • update co2 factors based on manual energy source group updates by @grgmiller in #377
  • Add geopy to pyproject dependencies by @grgmiller in #378
  • Add backstop sulfur content percentage for years 2005, 2006 and 2007 by @rouille in #376
  • Compare plants coordinates from PUDL and EIA-860 by @rouille in #379
  • Update warning message about validated years by @rouille in #381
  • Discard non-operational generators when calculating plant capacity by @rouille in #382
  • Revert removal of GTN shift factors by @grgmiller in #383
  • Update to 0.5.0 and change s3 directory by @grgmiller in #384
  • Fix missing capacity in plant static attributes by @grgmiller in #385
  • Update documentation by @rouille in #380
  • Historical coverage feature / v0.5.0 by @grgmiller in #386
  • Update Citation by @grgmiller in #387

Full Changelog: v0.4.0...v0.5.0

v0.4.0

05 Apr 04:10
83bdce8

Choose a tag to compare

This minor release improves current validation checks, adds new validation checks, enforces static sub-plant id across years and allows users to access any Global Warming Potential value via the IPCC assessment report name where it is published.

Update sub-plant crosswalk table

As discovered in #351, the subplant_id assigned to each (plant_id_eia, generator_id) does not remain static across each year of OGE data. This is an issue if trying to use subplant_id as a primary key to compare data across multiple years.

This PR updates the process of creating sub-plant IDs to try to enforce static sub-plant IDs. The changes in this PR enforce static sub-plant IDs within a single data release version of OGE, although the sub-plant IDs may still change from version to version. (#353)

Validation Checks

  • For all warnings about plant-level data, adds information about the balancing area the flagged plant belongs to to help identify BAs where data quality is affected. (#348)
  • When checking The validation check detecting mismatch between input and allocated EIA-923 data is now done at the plant and energy source level (#350)
  • Functions for detecting anomalies in timeseries data have been added to the code base, and we now identify where gross generation, fuel consumption, and CO2 emission timeseries in the reported CEMS data may be anomalous based on a global extreme filter. (#349)

New feature

The function for calculating CO2-equivalent values now allows for the user to specify which IPCC Assessment Report to use for calculating GWP-adjusted CO2-equivalent values. (#352)

v0.3.3

27 Feb 17:32
7663993

Choose a tag to compare

This patch release addresses two issues that were preventing some users from being able to run the pipeline and use the OGE package:

  • Updates the instructions for using conda to manage the oge code environment and updates the environment.yml file that specifies the conda environment. This had fallen out of date with the pipfile environment files in recent releases. (#345)
  • Fixes an issue where the use of back slashes instead of foward slashes in oge.filepaths was causing errors when attempting to load OGE files from the s3 bucket. (#346)

This release does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.

v0.3.2

15 Feb 00:05
f265b3d

Choose a tag to compare

This patch release of OGE fixes an issue where the python version specified in pyproject.toml was incompatible with the version of python used in the rest of the package, preventing OGE from being installed in other projects (#344)

This release does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.

v0.3.1

10 Feb 17:43
d21db26

Choose a tag to compare

This patch release of OGE makes several updates to OGE's code infrastructure, dependencies, documentation, and file downloads, but does not affect any of the outputs. Thus, there will be no new data release that accompanies this code patch release. The most up to date version of the data is still 0.3.0.

Accessing OGE outputs and results through the cloud (#338)

  • In v0.3.0 we packaged OGE, allowing other projects to import OGE code directly. However, in order to load and use any of the downloads, outputs, or results files, it would still be necessary to run the data pipeline locally to make those files available.
  • This release allows these files to now be read directly from an AWS s3 bucket, eliminating the need for the pipeline to be run locally when importing OGE into another project.
  • Instructions for how to set the s3 bucket as the default data store are now included in the readme
  • We also fixed a bug where a log file was being created whenever an OGE function was called from another project. Now, a log file should only be created whenever the main data pipeline is run (#340)

Updates eGRID downloads to include eGRID2022 (#337)

  • Although eGRID is not used as an input to the OGE data pipeline, these files are downloaded and included in the data store, as the eGRID data can be loaded and explored via several functions in OGE.
  • This release includes the newly-published eGRID2022 file in the set of downloaded files
  • This release also standardizes the downloaded eGRID file names to use consistent capitalization across years.

More transparent conversion factors and constants (#339)

  • In past versions of OGE, some of the standard conversion factors and assumed values were spread across multiple files.
  • This release moves all of these factors and assumed values (if not already included in any of the reference_tables) to a centralized location in constants.py so that they can be easily reviewed.
  • Moving these factors also helped avoid the potential for circular imports between the modules.

Miscellaneous

  • Updates several package dependencies in the pipfile to address security updates (#341)
  • Updates small errors in README file

v0.3.0

29 Dec 19:56
939dd3b

Choose a tag to compare

Updates PUDL dependency (#318 )

  • Updates pudl dependency from v2022.11.30 to v2023.12.01, which includes a number of updates to the database structure and naming conventions (see pudl release notes)
  • Changes source of PUDL database download to AWS rather than Zenodo, providing faster access to PUDL data releases
  • PUDL’s CEMS database now includes data from AK, HI, and PR, which should improve hourly emissions data coverage for plants in AK and HI
  • A cleaned and standardized version of the EPA-EIA power sector data crosswalk is now included in the pudl database, meaning we no longer have to manually load and standardize this data
  • Emissions control equipment data from EIA-860 is now included in the pudl database, meaning we no longer need to manually load and standardize this data
  • Leading zeros removed from boiler_ids, which should improve mapping between boiler tables
  • The EIA-923 generation and fuel allocation process is now fully integrated into PUDL
  • Fixes an issue where certain plants in NY state were being assigned the wrong BA code.

Adds 2022 data (#322)

  • Integrates Final release input data from the EIA and EPA for 2022
  • Adds 2022 OGE outputs

Manual reference table update (#322)

  • Most reference tables did not require updating
  • NOX and SO2 emissions factors: added new factors for boiler configurations that had not previously been included in the table.
  • Balancing Areas: Added retirement dates for the CFE (July 2018), GLHB (September 2022), GRIF (November 2023) balancing areas
  • Added new EPA-EIA plant and unit crosswalks based on 2022 data
  • Added several new mappings between utilities and balancing areas

Infrastructure Updates

  • Updates Python dependency from 3.10 to 3.11
  • Refactors and packages OGE codebase so that functions, reference tables, and data from OGE can be imported into other projects. This package will go live on PyPi soon. (#323)
  • Re-organizes location of data files. The data/manual files have been renamed to reference_tables and moved to src/oge, while all downloads, output files, and result files will now be saved in the user’s home directory in a folder called open_grid_emissions_data (#324)
  • Adds support for pipenv environment management in addition to conda (#313)
  • Changes PUDL and gridemissions dependencies to forks within the singularity-energy organization, rather than forked versions that lived in individual authors’ github accounts.
  • Moves documentation from separately-maintained repo into the OGE repo (#303)
  • Changes code formatting from black to ruff and adds formatting checks that must pass before merging code (#317)

Other bug/data quality fixes

  • Ensure complete as possible EPA-EIA power sector data crosswalk by combining pudl-standardized PSDC, plant code mappings from eGRID, and our own manual crosswalking.
  • Add handling for negative fuel consumption reported in EIA-923
  • Stop dropping missing and zero values to help ensure complete timeseries
  • Previously, we had dropped data from CEMS that reflected units that only reported steam generation but no electricity generation. Based on an updated understanding of this data, we no longer drop this data from OGE.
  • Fixes bug in EIA-923 generation and fuel allocation process that was resulting in certain reported fuel consumption data being dropped for plants that retire mid-year
  • Updates manual timestamp corrections to EIA-930 data for 2022 and on CAISO data (#300), 2021 and on TEPC data (#322)

Adds new data validation checks

  • Flags when different plant primary fuel identification methods result in different primary fuel assignments: Exports the primary_fuel_table with all intermediate columns to outputs to help with validation. Adds a new validation check to flag when the plant primary fuel assigned by the pipeline does not match the capacity-based primary fuel assignment. (#296)
  • Flags when subplants only contain a single combined cycle component: Combined cycle generators contain a steam part (CA) and turbine part (CT) that are linked together. Thus, our subplant groups that contain one part of a combined cycle plant should always in theory contain the other part as well. This PR adds a test that checks that both parts exist in a subplant if one exists. Besides CT and CA prime movers, there is also CS prime movers which represent a "single shaft" combined cycle unit where the steam and turbine parts share a single generator. These prime movers are allowed to be by themselves in a subplant, as are CC prime movers, which represent a "total unit." This PR adds a prime_mover_code column to the subplant crosswalk table to help validating this.(#297)
  • Checks for complete monthly data within a single year: Checks that 12 monthly “report_date”s exist for each plant/subplant, and also checks that the number of missing monthly datapoints matches the number of missing datapoints in the input data from CEMS and EIA-923.
  • Checks for complete hourly timestamps within a single year or single month: If the period is a 'year', checks that the length of the timeseries is 8760 (for a non-leap year) or 8784 (for a leap year). If the period is a 'month', checks that the length of the timeseries is equal to the length of the complete date_range between the earliest and latest timestamp in a month.(#299)
  • Exports a new output table that identifies whether input data (and non-zero input data) exists for each plant in EIA-923 and/or CEMS.