Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: "'codedValues' not defined in section 7" when loading a possibly corrupted file #131

Closed
lkugler opened this issue Jul 22, 2018 · 11 comments

Comments

@lkugler
Copy link

lkugler commented Jul 22, 2018

Hi all,
I got a KeyError: "'codedValues' not defined in section 7" when loading a grib2. I suspect it is raised because the file is somehow corrupt, because loading works on some files and does not on others.

I uploaded a sample file here, which raises the error for me: http://homepage.univie.ac.at/a1254888/ICON_EU_single_level_elements_CLCM_2018050512.grib2
It originates from a grib2 from opendata.dwd.de, which has been loaded with Iris, reduced in size (area) and saved. (on about 10-20% of all files the error is raised)

It is enough to do
cube = iris.load('ICON_EU_single_level_elements_CLCM_2018050512.grib2')[0]
cube.data
and the error is raised.

How should I load many files, not knowing which/if a file is corrupted?
For now I am iterating over all files and load them inside a try-block. Does not seem to be a lot slower on ~300 files.

Would be great if you could share your comments on the problem.
Thanks!

@DPeterK
Copy link
Member

DPeterK commented Aug 20, 2018

Hi @loxn8773 - thanks for raising this, and apologies that it's taken a little while for anyone to get back to you...

In GRIB2, section 7 of each message stores the actual data values of that particular message. As you may be aware, GRIB2 stores files as a series of messages containing data and metadata. Each message contains a 2D field of (latitude-like, longitude-like) values at a given height, time, or other. This way, when all the messages within a GRIB2 file are loaded, the 2D fields are tiled together to make one or more n-dimensional data structures (such as Iris cubes). If the data values are missing from a message, this effectively means that one of the tiles has no data, so part of the cube simply won't have any data.

As it's the codedValues key that stores the 2D data for each message, if this key is missing then Iris won't be able to construct a cube from the data. As such, this key is a key that must be present in each message, so iris-grib will raise an error if it isn't present - this is the error you've encountered here.

In effect then, this GRIB2 file is corrupted. It might be worth contacting the data supplier to see if you can work out why some of the messages are missing their data element. You can see some of the missing codedValues keys by using the grib_dump command-line tool and grepping for both the message header and the presence of a codedValues key within each message:

grib_dump -O ICON_EU_single_level_elements_CLCM_2018050512.grib2 | grep "MESSAGE\|codedV"
...
#==============   MESSAGE 22 ( length=179 )                ==============
#==============   MESSAGE 23 ( length=2525 )               ==============
6-2351    codedValues = (782,2346) {
} # data_g2simple_packing codedValues 
#==============   MESSAGE 24 ( length=2525 )               ==============
6-2351    codedValues = (782,2346) {
...

You can see from this snip that message 22 (among others) is missing a codedValues key.

In terms of working around this in Python, it sounds like your existing try-except is a very reasonable solution, especially as it isn't too slow in your case. One danger is that you're loading a lot of data into memory, as you only encounter the error on data loading. This means the try-except solution may break down with more significant data volumes.

Another option is to make use of some of the underlying iris_grib functionality to check each message for the presence of a codedValues key, and only load the resultant cube if all messages contributing to the cube have a codedValues key. I've put together an example of this, which I've attached, but note that this does not take account of differing phenomena in the input fields, so if you have a single file that loads to more than one cube, this code may not behave correctly.

Hope this helps!

codedValues_key_present.txt

@lkugler
Copy link
Author

lkugler commented Aug 20, 2018

Great explanation, thanks!

@trexfeathers
Copy link
Contributor

This looks like it was solved several years ago! We're gonna close it, feel free to re-open if you still need help.

@greenlaw
Copy link

greenlaw commented Feb 7, 2023

For anyone else who comes across this - We have also encountered this with many messages, and according to people who are more knowledgeable about GRIB2 than I am, it is completely valid for codedValues to be missing in certain cases. In those cases, the array can be computed by inspecting the contents of Section 5 (Data Representation Section).

To quote the spec:

(4) The original data value Y (in the units of code table 4.2) can be recovered with the formula:

Y * 10**D= R + (X1+X2) * 2**E

For simple packing and all spectral data
E = Binary scale factor,
D = Decimal scale factor
R = Reference value of the whole field,
X1 = 0,
X2 = Scaled (encoded) value.

For complex grid point packing schemes, E, D, and R are as above, but

X1 = Reference value (scaled integer) of the group the data value belongs to,
X2 = Scaled (encoded) value with the group reference value (XI) removed.

More information can be found here: https://apps.ecmwf.int/codes/grib/format/grib2/regulations/

@pp-mo
Copy link
Member

pp-mo commented Nov 6, 2023

This problem has recently been re-raised.
Whether or not this is "correct" encoding, it clearly is out there.

I'm doubting whether this was actually "fixed" as stated above -- perhaps we still need to add robustness for this case ??

@greenlaw
Copy link

greenlaw commented Nov 6, 2023

@pp-mo Just FYI, I posted our own monkeypatch workaround in the issue linked above: #355 (comment)

If it would be helpful, I can submit this in a PR. We have some other internal patches that would also be good candidates for PRs, but haven't gotten around to packaging them up just yet.

@larsbarring
Copy link

@greenlaw I just saw your comment (above), and thanks for the monkeypatch in #355, it worked out nicely. As we are more and more exploring Iris [also] as tool for reading grib files we would be interested the other patches you have. At least from my perspective a PR would be useful

@pp-mo
Copy link
Member

pp-mo commented Nov 21, 2023

This problem has recently been re-raised ... perhaps we still need to add robustness for this case ??

Should be fixed in latest v0.19 release, so I hope we can close this issue now.
@lkugler @larsbarring can you confirm this ?

@larsbarring
Copy link

Regarding the latest 0.19 release, please note this comment and this response. Unfortunately, I do not have a suitable test file at hand.

Regarding @greenlaw's kind and constructive offer to share their improvements, I think that would be very helpful because GriB is a complicated format (as we have seen in this issue) and all insights are useful in preventing oneself to run into problems.

Having said that, I am fine with closing this issue.

@lkugler
Copy link
Author

lkugler commented Nov 22, 2023

I'm sorry, I can't test the patch, it's been a long time, I don't work with GRIB at anymore.

@bjlittle
Copy link
Member

Given the recent iris-grib 0.19.0 release, we consider this issue addressed.

Please reopen if you want to discuss further and propose additional changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants