Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some FERC filings share a ReportDate or CertifyingOfficialDate, but were published at different times. #2822

Closed
Tracked by #2821
jdangerx opened this issue Sep 1, 2023 · 2 comments
Labels
bug Things that are just plain broken. ferc1 Anything having to do with FERC Form 1 xbrl Related to the FERC XBRL transition

Comments

@jdangerx
Copy link
Member

jdangerx commented Sep 1, 2023

For FERC forms 1, 2, 6, and 60, we expect each filing to include a ReportDate fact, and for FERC 714 we expect each filing to include a CertifyingOfficialDate fact.

We use these in the ferc-xbrl-extractor to order the filings by recency, so we can merge all the filings and use the most recent data we have for any given fact.

However, these only have day-level granularity, and often filings share the same date fact but are published at different times. We can tell because we can see the publish time with high granularity from the RSS feed metadata, which we already track.

To avoid ambiguity, we should use that RSS feed metadata instead of the report's self-reported date to determine which report should take precedence.

@jdangerx jdangerx added the bug Things that are just plain broken. label Sep 1, 2023
@zaneselvans zaneselvans added ferc1 Anything having to do with FERC Form 1 xbrl Related to the FERC XBRL transition labels Sep 20, 2023
@jdangerx
Copy link
Member Author

jdangerx commented Oct 6, 2023

This causes data to get dropped when we are reading data from the FercXbrlSqliteExtractor, since it uses filing_name to join the data table with an ID table. If the ID table and the data table choose different filings from the same report date, then the data won't get found in that join.

jdangerx added a commit that referenced this issue Oct 6, 2023
There are some missing data due to messy deduplication:
#2822

But we'll do the deduplication better in here:
#2899
jdangerx added a commit that referenced this issue Oct 6, 2023
)

* Update to use new version of ferc-xbrl-extractor

* Fix issues arising from stricter typing used in pandas 2.1

* Use integer transmission circuits.

* Remove obsolete references to ferc1_schema tests.

* Make new extractor compatible with 2021 data

The new extractor added some data to the 2021 XBRL archives. This caused some integration and validation test fails. I added some plants to the pudl_id mapping spreadsheet, all of which are considered totals. I.e., not real plants, but we're mapping them for the sake of giving them an ID (they are not connected to EIA records). Because this is how we treat other total records reported to FERC1.

This also updates the way that values were assigned to a slice of the ferc1_eia_train output spreadsheets. NA values were causing an issue, so I had to change how the values were being converted.

This also updates the test_minmax_rows test to reflect the new rows in the 2021 data.


* Add a few plants to pudl_id_mapping

Totally new:

* 18012: pjm interconnection, llc / total
* 18013: new york state electric & gas corporation / see footnote
* 18014: southwest power pool, inc. / total
* 18015: public service company of colorado / community solar gardens
* 18016: the empire district electric company / n/a
  each & 73 units at 2.52 mw each)
* 18017: wisconsin electric power company / see footnote
* 18018: upper michigan energy resources company (pudl determined) / total
* 18019: new york transco, llc / total
* 18020: wilderness line holdings, llc / total
* 18021: mt. carmel public utility co / total

Mapped to existing PUDL ID:

* 8671: pacific gas & electric company, small hydroelectric generating plants
* 15000: idaho power company / hydro
* 15001: idaho power company / internal combustion
* 15068: public service company of colorado / conventional hydro
* 12926: midamerican energy company / ida grove ii wind farm (8 units at 2.3 mw
* 1287: alaska electric light and power company / salmon creek hyrdo

Note the misspelling of the plant name in 1287.

Changed:

* 15031: mt. carmel public utility co / not applicable -> ameren
  illinois company / not applicable

  This one had a mismatch between utility_id_ferc 222, which corresponds
  to Ameren, not Mt. Carmel (397).

* Update validation test expectations.

There are some missing data due to messy deduplication:
#2822

But we'll do the deduplication better in here:
#2899

---------

Co-authored-by: zschira <[email protected]>
Co-authored-by: Zane Selvans <[email protected]>
Co-authored-by: Austen Sharpe <[email protected]>
@jdangerx
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that are just plain broken. ferc1 Anything having to do with FERC Form 1 xbrl Related to the FERC XBRL transition
Projects
Archived in project
Development

No branches or pull requests

2 participants