catalyst-cooperative · aesharpe · Oct 19, 2024 · Sep 30, 2024 · Sep 30, 2024 · Sep 30, 2024
diff --git a/devtools/generate_batch_config.py b/devtools/generate_batch_config.py
@@ -56,8 +56,8 @@ def to_config(
                         }
                     ],
                     "computeResource": {
-                        "cpuMilli": 8000,
-                        "memoryMib": int(63 * MIB_PER_GB),
+                        "cpuMilli": 16000,
+                        "memoryMib": int(127 * MIB_PER_GB),
                         "bootDiskMib": 100 * 1024,
                     },
                     "maxRunDuration": f"{60 * 60 * 12}s",

diff --git a/docs/conf.py b/docs/conf.py
@@ -162,6 +162,7 @@ def data_sources_metadata_to_rst(app):
         "epacems",
         "phmsagas",
         "gridpathratoolkit",
+        "vcerare",
     ]
     package = PUDL_PACKAGE
     extra_etl_groups = {"eia860": ["entity_eia"], "ferc1": ["glue"]}
@@ -213,6 +214,7 @@ def cleanup_rsts(app, exception):
     (DOCS_DIR / "data_sources/epacems.rst").unlink()
     (DOCS_DIR / "data_sources/phmsagas.rst").unlink()
     (DOCS_DIR / "data_sources/gridpathratoolkit.rst").unlink()
+    (DOCS_DIR / "data_sources/vcerare.rst").unlink()
 
 
 def cleanup_csv_dir(app, exception):

diff --git a/docs/data_access.rst b/docs/data_access.rst
@@ -130,6 +130,7 @@ so we have moved to publishing all our hourly tables using the compressed, colum
 * `FERC-714 Hourly Estimated State Demand <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_ferc714__hourly_estimated_state_demand.parquet>`__
 * `FERC-714 Hourly Planning Area Demand <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_ferc714__hourly_planning_area_demand.parquet>`__
 * `GridPath RA Toolkit Hourly Available Capacity Factors <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/out_gridpathratoolkit__hourly_available_capacity_factor.parquet>`__
+* `VCE Resoruce Adequacy Renewable Energy Dataset <https://s3.us-west-2.amazonaws.com/pudl.catalyst.coop/nightly/ out_vcerare__hourly_available_capacity_factor.parquet>`__
 
 Raw FERC DBF & XBRL data converted to SQLite
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/docs/data_sources/index.rst b/docs/data_sources/index.rst
@@ -18,6 +18,7 @@ The following data sources serve as the foundation for our data pipeline.
    ferc714
    phmsagas
    gridpathratoolkit
+   vcerare
    other_data
 
 .. toctree::

diff --git a/...E-Weather-Dataset-Overview_August2020.pdf → ..._Weather_Dataset_Overview_August_2020.pdf b/...E-Weather-Dataset-Overview_August2020.pdf → ..._Weather_Dataset_Overview_August_2020.pdf
diff --git a/docs/dev/nightly_data_builds.rst b/docs/dev/nightly_data_builds.rst
@@ -265,7 +265,7 @@ ways to install the Google Cloud SDK explained in the link above.
 
 .. code::
 
-  conda install -c conda-forge google-cloud-sdk
+  mamba install -c conda-forge google-cloud-sdk
 
 Log into the account you used to create your new project above by running:
 
@@ -297,16 +297,17 @@ that are available:
 
 .. code::
 
-   gsutil ls -lh gs://builds.catalyst.coop
+   gcloud storage ls --long --readable-sizes gs://builds.catalyst.coop
 
 You should see a list of directories with build IDs that have a naming convention:
 ``<YYYY-MM-DD-HHMM>-<short git commit SHA>-<git branch>``.
 
-To see what the outputs are for a given nightly build, you can use ``gsutil`` like this:
+To see what the outputs are for a given nightly build, you can use ``gcloud storage``
+like this:
 
 .. code::
 
-    gsutil ls -lh gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/
+    gcloud storage ls --long --readable-sizes gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/
 
     804.57 MiB  2024-01-03T11:19:15Z  gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/censusdp1tract.sqlite
       5.01 GiB  2024-01-03T11:20:02Z  gs://builds.catalyst.coop/2024-01-03-0605-e9a91be-dev/core_epacems__hourly_emissions.parquet
@@ -337,22 +338,23 @@ To see what the outputs are for a given nightly build, you can use ``gsutil`` li
     TOTAL: 25 objects, 23557650395 bytes (21.94 GiB)
 
 If you want to copy these files down directly to your computer, you can use
-the ``gsutil cp`` command, which behaves very much like the Unix ``cp`` command:
+the ``gcloud storage cp`` command, which behaves very much like the Unix ``cp`` command:
 
 .. code::
 
-   gsutil cp gs://builds.catalyst.coop/<build ID>/pudl.sqlite ./
+   gcloud storage cp gs://builds.catalyst.coop/<build ID>/pudl.sqlite ./
 
 If you wanted to download all of the build outputs (more than 10GB!) you could use ``cp
 -r`` on the whole directory:
 
 .. code::
 
-   gsutil cp -r gs://builds.catalyst.coop/<build ID>/ ./
+   gcloud storage cp --recursive gs://builds.catalyst.coop/<build ID>/ ./
 
-For more details on how to use ``gsutil`` in general see the
-`online documentation <https://cloud.google.com/storage/docs/gsutil>`__ or run:
+For more background on ``gcloud storage`` see the
+`quickstart guide <https://cloud.google.com/storage/docs/discover-object-storage-gcloud>`__
+or check out the CLI documentation with:
 
 .. code::
 
-   gsutil --help
+   gcloud storage --help
diff --git a/docs/release_notes.rst b/docs/release_notes.rst
@@ -6,6 +6,21 @@ PUDL Release Notes
 v2024.X.x (2024-XX-XX)
 ---------------------------------------------------------------------------------------
 
+New Data
+^^^^^^^^
+
+Vibrant Clean Energy Resource Adequacy Renewable Energy (RARE) Power Dataset
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+* Integrate the VCE hourly capacity factor data for solar PV, onshore wind, and
+  offshore wind from 2019 through 2023. The data in this table were produced by
+  Vibrant Clean Energy, and are licensed to the public under the Creative Commons
+  Attribution 4.0 International license (CC-BY-4.0). This data complements the
+  WECC-wide GridPath RA Toolkit data currently incorporated into PUDL, providing
+  capacity factor data nation-wide with a different set of modeling assumptions and
+  a different granularity for the aggregation of outputs.
+  See :doc:`data_sources/gridpathratoolkit` and :doc:`data_sources/vcerare` for
+  more information.  See :issue:`#3872`.
+
 New Data Coverage
 ^^^^^^^^^^^^^^^^^
 

diff --git a/docs/templates/vcerare_child.rst.jinja b/docs/templates/vcerare_child.rst.jinja
@@ -0,0 +1,68 @@
+{% extends "data_source_parent.rst.jinja" %}
+{% block background %}
+The data in the Resource Adequacy Renewable Energy (RARE) Power Dataset was produced by
+Vibrant Clean Energy based on outputs from the NOA HRRR model and are licensed
+to the public under the Creative Commons Attribution 4.0 International license
+(CC-BY-4.0).
+
+See the `README <https://doi.org/10.5281/zenodo.13937523>`__ archived on Zenodo for more
+detailed information.
+{% endblock %}
+
+{% block download_docs %}
+{% for filename in download_paths %}
+* :download:`{{ filename.stem.replace("_", " ").title() }} ({{ filename.suffix.replace('.', '').upper() }}) <{{ filename }}>`
+{% endfor %}
+* `NOAA HRRR Model Overview <https://rapidrefresh.noaa.gov/hrrr/>`__
+{% endblock %}
+
+
+{% block availability %}
+Hourly, county-level data from 2019 - 2023 is integrated into PUDL. There is a
+second release of data for the years 2014 - 2018 expected in Q1 of 2025, which will be
+integrated into PUDL pending funding availability.
+{% endblock %}
+
+{% block respondents %}
+This data does not come from a government agency, and is not the result of compulsory
+data reporting.
+{% endblock %}
+
+{% block original_data %}
+The contents of the original CSVs are formatted so that Excel can display the
+data without crashing. There's one file per year per generation type, and each
+file contains an index column for time (simply 1, 2, 3...8760 to
+represent the hours in a year) and columns for each county populated with capacity
+factor values as a percentage from 0-100.
+{% endblock %}
+
+{% block notable_irregularities %}
+Non-county regions
+------------------
+
+The original data include capacity factors for some non-county areas including the Great
+Lakes and 2 small cities (Bedford City, VA and Clifton Forge City, VA). It associated
+"county" FIPS IDs with those areas, meaning that there was not a 1:1 relationship
+between the FIPS IDs and the named areas, and the geographic region implied by the
+FIPS IDs did not correspond to the named area. We've dropped the cities -- one of which
+contained no data -- and set the FIPS codes for the Great Lakes to NA. Note that lakes
+bordering multiple states will appear more than once in the data. VCE used a nearest
+neighbor technique to assign the state waters to the counties (this pertains to coastal
+areas as well).
+
+Capacity factors > 1
+--------------------
+There are a couple of capacity factor values for the solar pv data that exceed
+the maximum value of 1 for capacity factor (or 100 for the raw data--PUDL converts the
+data from a percentage to a fraction to match other reported capacity factor data). This
+is due to power production performance being correlated with panel temperatures. During
+cold sunny periods, some solar capacity factor values are greater than 1 (but less that
+1.1).
+
+8760-hour years
+---------------
+This data is primarily used for modeling purposes and conforms to the 8760 hour/year
+standard regardless of leap years. This means that 2020 is missing data for December
+31st.
+
+{% endblock %}