-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate error reading tables > 2GiB #189
Comments
@sslavney apologies here. Validate has a known issue reading in files >2GB. we will work to update the software to better handle this error and hopefully come up with a solution. |
Same issue for us using an older version (1.16.0-20190718-e5b39a1) and we confirm this is an issue with very large files. We are now integrating release 1.20.0. Do you want us to check this issue against this other version?
2020-02-05 04:05:19 INFO > Executing job with groupId = Packaging and jobId = ProductPackagerJob_1577bbfd-68f0-4d6f-b419-c69de6fa854c
2020-02-05 04:05:19 INFO Package temporary folder created: /tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516
2020-02-05 04:05:32 INFO Using validation style 'PDS4 Label' for location file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml
2020-02-05 04:05:32 INFO Starting validation task for location 'file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml'
2020-02-05 04:07:31 INFO Validation complete for location 'file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml'
2020-02-05 04:07:31 INFO
PDS Validation Tool 1.16.0-20190718-e5b39a1 Report:
|
Confirmed also as an issue for PDS Validation Tool 1.20.0. Cheers |
@sslavney @josinde as a note, this directly relates to NASA-PDS/transform#2 which also uses the underlying PDS4-JParser library. We will add this to our release plan for next build as this may require some significant overhaul of the underlying library |
Is there a "known" limit to the size of files that validate can handle? i.e. is it exactly 2GB? Just to be aware and programmatically skip such files if needs be (or disable content validation) Also, does it affect all data types (i.e. Table_Character and Table_Binary as well etc.) |
@mcayanan do you remember the details for this 2GB cap? or is it more ~2GB? |
@jordanpadams Unfortunately no. I would recommend in the code throwing a stacktrace at to see where the error is coming from. That might jog my brain cells. :) |
|
Looks like the |
@jordanpadams Ya that looks to be the issue. Specifically, it's trying to allocate a total size of 2,687,074,568 bytes, which is greater than the max int value. I forgot exactly how large arrays (greater than 2GB) are being handled currently, but I would imagine the tool should be updated similarly on the large table end. |
Thanks @mcayanan . I knew I saw us buffering somewhere else in the code so this just needs to be updated to do the same. Thanks for the tip! |
Duplicate of NASA-PDS/pds4-jparser#21 |
@hhlee445 see comments above: #189 (comment) and #189 (comment) |
Hi @jordanpadams would it be possible to clarify this? We need to work around by programmatically skipping validation for products over this threshold, and it would be good to confirm the exact value! |
@msbentley unfortunately this is a tough thing to test for exactness. @hhlee445 is in the process of implementing a fix as we speak, so if you can wait another week or 2, we may be able to use that version of the software. |
OK, thanks @jordanpadams and @hhlee445 - I can wait 👍 |
issue #189: Fix to validate error reading tables larger than 2GB
@msbentley @sslavney this should now be fixed. feel free to try out the latest snapshot version of validate here: https://github.com/NASA-PDS/validate/releases/tag/1.25.0-SNAPSHOT thanks to @hhlee445 for the excellent work here! Note: this update did not make it into Build 11.0 I&T, and since it is a pretty significant change to how we read in data files, I would prefer it be rigorously tested prior to an official release in the Spring. |
Describe the bug
Validate 1.18.2 gives the error message "ERROR [error.table.bad_file_read] table 4: Error occurred while trying to read table: null" when running content validation on a large data file (2.5 GB) containing multiple binary tables.
To Reproduce
Steps to reproduce the behavior:
Download the data file and its label from
https://pds-geosciences.wustl.edu/messenger/mess-h-rss_mla-5-sdp-v1/messrs_1001/data/shbdr/jgmess_160av01_shb.dat
and
https://pds-geosciences.wustl.edu/messenger/mess-h-rss_mla-5-sdp-v1/messrs_1001/data/shbdr/jgmess_160av01_shb.xml.
Run Validate version 1.18.2 with this command:
validate jgmess_160av01_shb.xml -R pds4.label -v2 -r validate_output.txt
This is what appears on the screen:
"Feb 20, 2020 10:54:06 AM com.sun.xml.bind.v2.runtime.reflect.opt.AccessorInjector
INFO: The optimized code generation is disabled"
This is the error message in the output file:
"ERROR [error.table.bad_file_read] table 4: Error occurred while trying
to read table: null"
The complete output file is attached.
validate182_shbdr_error.txt
Expected behavior
I expected the contents of file to be valid because they appear to be correct in the PDS4 Viewer, and because they appear to be correct in NASAView when read via a PDS3 label. (This is a migrated MESSENGER product that has both a PDS3 and a PDS4 label.)
** Version of Software Used**
Version 1.18.2
Desktop (please complete the following information):
Additional context
When run without content validation, Validate reports no errors for this product.
Related to NASA-PDS/transform#2
The text was updated successfully, but these errors were encountered: