Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform vceregen renewable generation profiles #3898

Merged
merged 106 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
9fadfff
Add source metadata for vceregen
aesharpe Sep 30, 2024
4036479
Add profiles to vceregen dataset name
aesharpe Sep 30, 2024
d8b992e
Remove blank line in description
aesharpe Sep 30, 2024
cac5509
Add blank Data Source template for vceregen
aesharpe Sep 30, 2024
a44a399
Add links to download docs section
aesharpe Sep 30, 2024
b8aa4e4
Add availability section
aesharpe Sep 30, 2024
65ddf74
Add respondents section
aesharpe Sep 30, 2024
d8071ed
Add original data section
aesharpe Sep 30, 2024
82be4d4
Stash WIP of extraction
e-belfer Oct 1, 2024
e044e93
Extract VCE tables to raw dask dfs
e-belfer Oct 2, 2024
57ad9a7
Clean up warnings and restore EIA 176
e-belfer Oct 2, 2024
b922328
Revert to pandas concatenation
e-belfer Oct 2, 2024
53934f2
Add latlonfips
e-belfer Oct 2, 2024
3c8cab6
Add blank transform module for vceregen
aesharpe Oct 3, 2024
9f7814b
Fill out the basic vceregen transforms
aesharpe Oct 4, 2024
600b3e2
Add underscores back to function names
aesharpe Oct 5, 2024
e6242f2
Update time col calculation
aesharpe Oct 7, 2024
130f1f0
Update docstrings and comments to reflect new time cols
aesharpe Oct 7, 2024
339d791
Change merge to concat
aesharpe Oct 7, 2024
3677f78
Remove dask, coerce dtypes on read-in
e-belfer Oct 8, 2024
c870b63
override load_column_maps behavior
e-belfer Oct 8, 2024
4a4511d
Merge branch 'main' into extract-vceregen
e-belfer Oct 8, 2024
a91d073
Update addition of county and state name fields
aesharpe Oct 8, 2024
076b113
Merge branch 'extract-vceregen' into transform-vceregen
aesharpe Oct 8, 2024
0b3fa45
Add vceregen to init files and metadata so that it will run on dagste…
aesharpe Oct 8, 2024
79c6016
Add resource metadata for vcregen
aesharpe Oct 8, 2024
54fd155
Clean county strings more
aesharpe Oct 9, 2024
6d14dae
Add release notes
aesharpe Oct 9, 2024
deae1e2
Add function to validate state_county_names and improve performance o…
aesharpe Oct 9, 2024
63d2666
make for loops into dict comp, update loggers, and improve regex
aesharpe Oct 9, 2024
24f4fb5
Add asset checks and remove inline checks
aesharpe Oct 9, 2024
da272f5
Change hour_utc to datetime_utc
aesharpe Oct 10, 2024
7bc1741
Remove incorrect docstring
aesharpe Oct 10, 2024
ae85f64
Update dataset and field metadata
aesharpe Oct 10, 2024
d669c74
Rename county col to county_or_subregion
aesharpe Oct 10, 2024
140a181
Merge branch 'transform-vceregen' into vceregen-docs
aesharpe Oct 11, 2024
7dbe5d2
Update data_source docs page
aesharpe Oct 11, 2024
98fd118
change axis=1 to axis=columns
aesharpe Oct 11, 2024
b6b5e6c
Merge branch 'main' into extract-vceregen
e-belfer Oct 11, 2024
291ba7d
Update DOI to sandbox and temporarily xfail DOI test
e-belfer Oct 11, 2024
44f3ae8
Merge branch 'extract-vceregen' into transform-vceregen
aesharpe Oct 11, 2024
8e6d88a
Change county_or_subregion to county_or_lake_name
aesharpe Oct 14, 2024
6a49f69
Change county_or_subregion to county_or_lake_name
aesharpe Oct 14, 2024
1f3666c
Merge branch 'transform-vceregen' of https://github.com/catalyst-coop…
aesharpe Oct 14, 2024
7319e7f
Update docs to explain solar cap fac
aesharpe Oct 14, 2024
08d7341
Merge branch 'main' into transform-vceregen
aesharpe Oct 15, 2024
3eaebe6
Update regen to rare
e-belfer Oct 16, 2024
b324123
Merge branch 'main' into extract-vceregen
e-belfer Oct 16, 2024
120451d
Merge branch 'extract-vceregen' of https://github.com/catalyst-cooper…
e-belfer Oct 16, 2024
5b98e60
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2024
9f6204e
Merge branch 'main' into extract-vceregen
e-belfer Oct 16, 2024
77d47a4
Update gsutil in zenodo-cache-sync
e-belfer Oct 16, 2024
adfff81
Merge branch 'extract-vceregen' of https://github.com/catalyst-cooper…
e-belfer Oct 16, 2024
ece9dab
Merge branch 'extract-vceregen' into transform-vceregen
e-belfer Oct 16, 2024
0e3792d
Merge branch 'extract-vceregen' into transform-vceregen
aesharpe Oct 16, 2024
93e4487
Merge branch 'transform-vceregen' of https://github.com/catalyst-coop…
aesharpe Oct 16, 2024
7e3c926
Rename vceregen to vcerare
aesharpe Oct 16, 2024
6dea332
Add back user project
e-belfer Oct 16, 2024
7554a36
Update project path
e-belfer Oct 16, 2024
9ada9f5
Update project to billing project
e-belfer Oct 16, 2024
4158afd
Update dockerfile to replace gsutil with gcloud storage
e-belfer Oct 16, 2024
069c246
Merge branch 'extract-vceregen' into transform-vceregen
e-belfer Oct 16, 2024
2ba9b16
Update docs/release_notes.rst
aesharpe Oct 16, 2024
e0f6524
Update docs/release_notes.rst
aesharpe Oct 16, 2024
f793776
Update docs/templates/vcerare_child.rst.jinja
aesharpe Oct 16, 2024
d41c44d
First batch of little docs fixes
aesharpe Oct 16, 2024
98c2f69
Restructure _combine_city_county_records function
aesharpe Oct 17, 2024
d7b59d5
Add link to zenodo archive to data source page
aesharpe Oct 17, 2024
178f0fb
Clarify 1 vs. 100 in data source page
aesharpe Oct 17, 2024
7a1ebe1
Spread out comments in the _prep_lat_long_fips_df function
aesharpe Oct 17, 2024
782e925
Update docstring for _prep_lat_long_fips_df
aesharpe Oct 17, 2024
58aa99f
Switch order of add_time_cols and make_cap_frac functions
aesharpe Oct 17, 2024
e494482
Update _combine_city_county_records and move assertion to asset checks
aesharpe Oct 17, 2024
c2f3f75
Change all().all() to any().any()
aesharpe Oct 17, 2024
ccaa4ae
Add validations to merges
aesharpe Oct 17, 2024
3913006
Resolve merge conflicts with main
aesharpe Oct 17, 2024
865756a
docs cleanup tidbits
aesharpe Oct 17, 2024
c838e63
Turn _combine_city_county_records function into _drop_city_records an…
aesharpe Oct 17, 2024
05376e8
Make fips columns categorical and narrow scope of regex
aesharpe Oct 17, 2024
ef1a243
data source docs updates
aesharpe Oct 17, 2024
78fe904
Add downloadable docs to vcerare data source and fix data source file…
aesharpe Oct 18, 2024
6becc52
Remove 1.34 from field description for capacity_factor_solar_pv
aesharpe Oct 18, 2024
c2d16ae
Add some logs and a function to null county_id_fips values from lakes…
aesharpe Oct 18, 2024
0d13365
Update solar_pv metadata
aesharpe Oct 18, 2024
6cde307
Update solar_pv metadata
aesharpe Oct 18, 2024
f356336
Merge branch 'main' into transform-vceregen
aesharpe Oct 18, 2024
d03eab3
Rename RARE dataset in the release notes
aesharpe Oct 18, 2024
73b70d9
Add issue number to release notes
aesharpe Oct 18, 2024
15e0f40
Merge branch 'transform-vceregen' of https://github.com/catalyst-coop…
aesharpe Oct 18, 2024
ff23cbe
Update field description for county_or_lake_name
aesharpe Oct 18, 2024
de63b12
Update docstring for transform module
aesharpe Oct 18, 2024
710ead0
Make all references to FIPS uppercase in notes and comments
aesharpe Oct 18, 2024
69b4f71
Correct inline comment in _null_non_county_fips_rows
aesharpe Oct 18, 2024
71af223
Fix asset check
aesharpe Oct 18, 2024
d1074f8
Minor late-night PR fixes
zaneselvans Oct 18, 2024
ae5fd8c
Log during VCE RARE asset checks to see what's slow.
zaneselvans Oct 18, 2024
06db957
Add simple notebook for processing vcerare data
aesharpe Oct 18, 2024
e6904ac
Re-enable Zenodo DOI validation unit test.
zaneselvans Oct 18, 2024
6516d7c
Update docs to use gcloud storage not gsutil
zaneselvans Oct 18, 2024
6f64a29
Try to reduce memory use & concurrency for VCE RARE dataset
zaneselvans Oct 18, 2024
a40b1f3
Retry policy for VCE + highmem use for VCE asset check.
zaneselvans Oct 18, 2024
077a47f
Bump VM RAM and remove very-high memory tag & retry
zaneselvans Oct 18, 2024
71256e4
Bump vCPUs to 16
zaneselvans Oct 18, 2024
cd78e56
Add fancy charts to notebook
aesharpe Oct 18, 2024
81297c9
Merge branch 'transform-vceregen' of https://github.com/catalyst-coop…
aesharpe Oct 18, 2024
bd13830
Add link to VCE data in nightly build outputs. Other docs tweaks.
zaneselvans Oct 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions devtools/generate_batch_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ def to_config(
}
],
"computeResource": {
"cpuMilli": 8000,
"memoryMib": int(63 * MIB_PER_GB),
"cpuMilli": 16000,
"memoryMib": int(127 * MIB_PER_GB),
"bootDiskMib": 100 * 1024,
},
"maxRunDuration": f"{60 * 60 * 12}s",
Expand Down
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ def data_sources_metadata_to_rst(app):
"epacems",
"phmsagas",
"gridpathratoolkit",
"vcerare",
]
package = PUDL_PACKAGE
extra_etl_groups = {"eia860": ["entity_eia"], "ferc1": ["glue"]}
Expand Down Expand Up @@ -213,6 +214,7 @@ def cleanup_rsts(app, exception):
(DOCS_DIR / "data_sources/epacems.rst").unlink()
(DOCS_DIR / "data_sources/phmsagas.rst").unlink()
(DOCS_DIR / "data_sources/gridpathratoolkit.rst").unlink()
(DOCS_DIR / "data_sources/vcerare.rst").unlink()


def cleanup_csv_dir(app, exception):
Expand Down
1 change: 1 addition & 0 deletions docs/data_access.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ so we have moved to publishing all our hourly tables using the compressed, colum
* `FERC-714 Hourly Estimated State Demand <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet>`__
* `FERC-714 Hourly Planning Area Demand <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet>`__
* `GridPath RA Toolkit Hourly Available Capacity Factors <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet>`__
* `VCE Resoruce Adequacy Renewable Energy Dataset <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/ out_vcerare__hourly_available_capacity_factor.parquet>`__

Raw FERC DBF & XBRL data converted to SQLite
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions docs/data_sources/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ The following data sources serve as the foundation for our data pipeline.
ferc714
phmsagas
gridpathratoolkit
vcerare
other_data

.. toctree::
Expand Down
22 changes: 12 additions & 10 deletions docs/dev/nightly_data_builds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ ways to install the Google Cloud SDK explained in the link above.

.. code::

conda install -c conda-forge google-cloud-sdk
mamba install -c conda-forge google-cloud-sdk

Log into the account you used to create your new project above by running:

Expand Down Expand Up @@ -297,16 +297,17 @@ that are available:

.. code::

gsutil ls -lh gs://builds.catalyst.coop
gcloud storage ls --long --readable-sizes gs://builds.catalyst.coop

You should see a list of directories with build IDs that have a naming convention:
``<YYYY-MM-DD-HHMM>-<short git commit SHA>-<git branch>``.

To see what the outputs are for a given nightly build, you can use ``gsutil`` like this:
To see what the outputs are for a given nightly build, you can use ``gcloud storage``
like this:

.. code::

gsutil ls -lh gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/
gcloud storage ls --long --readable-sizes gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/

804.57 MiB 2024-01-03T11:19:15Z gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/censusdp1tract.sqlite
5.01 GiB 2024-01-03T11:20:02Z gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/core_epacems__hourly_emissions.parquet
Expand Down Expand Up @@ -337,22 +338,23 @@ To see what the outputs are for a given nightly build, you can use ``gsutil`` li
TOTAL: 25 objects, 23557650395 bytes (21.94 GiB)

If you want to copy these files down directly to your computer, you can use
the ``gsutil cp`` command, which behaves very much like the Unix ``cp`` command:
the ``gcloud storage cp`` command, which behaves very much like the Unix ``cp`` command:

.. code::

gsutil cp gs://builds.catalyst.coop/<build ID>/pudl.sqlite ./
gcloud storage cp gs://builds.catalyst.coop/<build ID>/pudl.sqlite ./

If you wanted to download all of the build outputs (more than 10GB!) you could use ``cp
-r`` on the whole directory:

.. code::

gsutil cp -r gs://builds.catalyst.coop/<build ID>/ ./
gcloud storage cp --recursive gs://builds.catalyst.coop/<build ID>/ ./

For more details on how to use ``gsutil`` in general see the
`online documentation <https://cloud.google.com/storage/docs/gsutil>`__ or run:
For more background on ``gcloud storage`` see the
`quickstart guide <https://cloud.google.com/storage/docs/discover-object-storage-gcloud>`__
or check out the CLI documentation with:

.. code::

gsutil --help
gcloud storage --help
15 changes: 15 additions & 0 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,21 @@ PUDL Release Notes
v2024.X.x (2024-XX-XX)
---------------------------------------------------------------------------------------

New Data
^^^^^^^^

Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Integrate the VCE hourly capacity factor data for solar PV, onshore wind, and
offshore wind from 2019 through 2023. The data in this table were produced by
Vibrant Clean Energy, and are licensed to the public under the Creative Commons
Attribution 4.0 International license (CC-BY-4.0). This data complements the
WECC-wide GridPath RA Toolkit data currently incorporated into PUDL, providing
capacity factor data nation-wide with a different set of modeling assumptions and
a different granularity for the aggregation of outputs.
See :doc:`data_sources/gridpathratoolkit` and :doc:`data_sources/vcerare` for
more information. See :issue:`#3872`.

New Data Coverage
^^^^^^^^^^^^^^^^^

Expand Down
68 changes: 68 additions & 0 deletions docs/templates/vcerare_child.rst.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
{% extends "data_source_parent.rst.jinja" %}
{% block background %}
The data in the Resource Adequacy Renewable Energy (RARE) Power Dataset was produced by
Vibrant Clean Energy based on outputs from the NOA HRRR model and are licensed
to the public under the Creative Commons Attribution 4.0 International license
(CC-BY-4.0).

See the `README <https://doi.org/10.5281/zenodo.13937523>`__ archived on Zenodo for more
detailed information.
{% endblock %}

{% block download_docs %}
{% for filename in download_paths %}
* :download:`{{ filename.stem.replace("_", " ").title() }} ({{ filename.suffix.replace('.', '').upper() }}) <{{ filename }}>`
{% endfor %}
* `NOAA HRRR Model Overview <https://rapidrefresh.noaa.gov/hrrr/>`__
{% endblock %}


{% block availability %}
Hourly, county-level data from 2019 - 2023 is integrated into PUDL. There is a
second release of data for the years 2014 - 2018 expected in Q1 of 2025, which will be
integrated into PUDL pending funding availability.
{% endblock %}

{% block respondents %}
This data does not come from a government agency, and is not the result of compulsory
data reporting.
{% endblock %}

{% block original_data %}
The contents of the original CSVs are formatted so that Excel can display the
data without crashing. There's one file per year per generation type, and each
file contains an index column for time (simply 1, 2, 3...8760 to
represent the hours in a year) and columns for each county populated with capacity
factor values as a percentage from 0-100.
{% endblock %}

{% block notable_irregularities %}
Non-county regions
------------------

The original data include capacity factors for some non-county areas including the Great
Lakes and 2 small cities (Bedford City, VA and Clifton Forge City, VA). It associated
"county" FIPS IDs with those areas, meaning that there was not a 1:1 relationship
between the FIPS IDs and the named areas, and the geographic region implied by the
FIPS IDs did not correspond to the named area. We've dropped the cities -- one of which
contained no data -- and set the FIPS codes for the Great Lakes to NA. Note that lakes
bordering multiple states will appear more than once in the data. VCE used a nearest
neighbor technique to assign the state waters to the counties (this pertains to coastal
areas as well).

Capacity factors > 1
--------------------
There are a couple of capacity factor values for the solar pv data that exceed
aesharpe marked this conversation as resolved.
Show resolved Hide resolved
the maximum value of 1 for capacity factor (or 100 for the raw data--PUDL converts the
data from a percentage to a fraction to match other reported capacity factor data). This
is due to power production performance being correlated with panel temperatures. During
cold sunny periods, some solar capacity factor values are greater than 1 (but less that
1.1).

8760-hour years
---------------
This data is primarily used for modeling purposes and conforms to the 8760 hour/year
standard regardless of leap years. This means that 2020 is missing data for December
31st.

{% endblock %}
Loading