Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix function calculating averages of the fuel types #371

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

rouille
Copy link
Collaborator

@rouille rouille commented Jun 25, 2024

Purpose

Fix function that calculates the averages of the fuel types. Closes CAR-4205.

What the code is doing

Add a new row that summarizes all fuel categories as follows:

  • Calculate the overall generated emission rates as $\frac{\sum_f E_f}{\sum_f G_f}$, where $E_f$ is the absolute emission of fuel $f$ and $G_f$ is the corresponding net generation
  • Take the average of the non-rate columns
  • Use average as fuel category for this new summarized row.

Testing

Ran the 2005 pipeline

Where to look

Relevant changes are in the the write_generated_averages function. Other edits add typehints and docstrings to function in the oge.output_data module.

Usage Example/Visuals

N/A

Review estimate

5min

Future work

N/A

Checklist

  • Update the documentation to reflect changes made in this PR
  • Format all updated python files using black
  • Clear outputs from all notebooks modified
  • Add docstrings and type hints to any new functions created

@rouille rouille requested a review from grgmiller June 25, 2024 07:06
@rouille rouille self-assigned this Jun 25, 2024
@grgmiller
Copy link
Collaborator

@rouille could you please post some example screenshots of the new outputs with this fix just to double check that everything is being calculated correctly?

Copy link
Collaborator

@grgmiller grgmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but would be nice to see a screenshot of outputs just to confirm

@rouille
Copy link
Collaborator Author

rouille commented Jun 25, 2024

@rouille could you please post some example screenshots of the new outputs with this fix just to double check that everything is being calculated correctly?

Here it is for 2005. There is the nan row coming from plant 13213 in Mississippi, see plant attributes below

>>> psa = pd.read_csv("plant_static_attributes_2005.csv.zip")
>>> psa[psa["fuel_category"].isna()]
      plant_id_eia       plant_name_eia  capacity_mw plant_primary_fuel fuel_category fuel_category_eia930 state county city ba_code ba_code_physical  latitude  longitude plant_operating_date plant_retirement_date  distribution_flag         timezone data_availability  shaped_plant_id
3389         13213  BTEC New Albany LLC          NaN                NaN           NaN                  NaN    MS    NaN  NaN     NaN              NaN   34.5411   -88.9422                  NaN                   NaN               True  America/Chicago         cems_only              NaN

It is fixed in #368 and will be taken care of once we rebase this PR. Nevertheless the calculation is correct. The data frame being to wide, I drop the file instead of a screenshot.
annual_generation_averages_by_fuel_2005.csv

Copy link
Collaborator

@grgmiller grgmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looking at the sample outputs you posted, I think we want to keep the summary row as "total" and not "average". We want to take the sum of all of the numeric columns, and not the average, and it looks like the outputs are still averages of all of the other fuels
image

Could you please change "average" back to "total" and sum the numeric columns instead of averaging them?

I think the others where we reference this data are looking for the "total" row so changing the name will break those.

@grgmiller
Copy link
Collaborator

The "generated" rate columns are now correct, but we just want to also fix the numerical columns

@rouille
Copy link
Collaborator Author

rouille commented Jun 25, 2024

The "generated" rate columns are now correct, but we just want to also fix the numerical columns

Should we change the file name then, annual_generation_averages_by_fuel_2005.csv is confusing in my opinion. I would expect to find average value for all columns including the absolute ones.

@grgmiller
Copy link
Collaborator

Should we change the file name then
I think here the averages refers to fuel-average emissions factors, rather than averages across fuels. The total row is a sum (just like each of the fuel rows are a sum of specific plants - we are not calculating the average net generation across all coal plants), and it is the generated_rate columns that are the averages of those sums.

If we change the file name, we would just need to track down where we are using this and change the file name there as well.

@rouille
Copy link
Collaborator Author

rouille commented Jun 25, 2024

The "generated" rate columns are now correct, but we just want to also fix the numerical columns

Done. See file attached
annual_generation_averages_by_fuel_2005.csv

Copy link
Collaborator

@grgmiller grgmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@rouille rouille merged commit 505d055 into historical_coverage_feature Jun 25, 2024
1 check passed
@rouille rouille deleted the ben/averages branch June 25, 2024 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants