Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate error reading tables > 2GiB #189

Closed
sslavney opened this issue Feb 20, 2020 · 17 comments · Fixed by #268
Closed

Validate error reading tables > 2GiB #189

sslavney opened this issue Feb 20, 2020 · 17 comments · Fixed by #268
Assignees
Labels
bug Something isn't working

Comments

@sslavney
Copy link

sslavney commented Feb 20, 2020

Describe the bug
Validate 1.18.2 gives the error message "ERROR [error.table.bad_file_read] table 4: Error occurred while trying to read table: null" when running content validation on a large data file (2.5 GB) containing multiple binary tables.

To Reproduce
Steps to reproduce the behavior:

  1. Download the data file and its label from
    https://pds-geosciences.wustl.edu/messenger/mess-h-rss_mla-5-sdp-v1/messrs_1001/data/shbdr/jgmess_160av01_shb.dat
    and
    https://pds-geosciences.wustl.edu/messenger/mess-h-rss_mla-5-sdp-v1/messrs_1001/data/shbdr/jgmess_160av01_shb.xml.

  2. Run Validate version 1.18.2 with this command:
    validate jgmess_160av01_shb.xml -R pds4.label -v2 -r validate_output.txt

  3. This is what appears on the screen:
    "Feb 20, 2020 10:54:06 AM com.sun.xml.bind.v2.runtime.reflect.opt.AccessorInjector
    INFO: The optimized code generation is disabled"

  4. This is the error message in the output file:
    "ERROR [error.table.bad_file_read] table 4: Error occurred while trying
    to read table: null"
    The complete output file is attached.
    validate182_shbdr_error.txt

Expected behavior
I expected the contents of file to be valid because they appear to be correct in the PDS4 Viewer, and because they appear to be correct in NASAView when read via a PDS3 label. (This is a migrated MESSENGER product that has both a PDS3 and a PDS4 label.)

** Version of Software Used**
Version 1.18.2

Desktop (please complete the following information):

  • OS: Windows Server 2012 R2

Additional context
When run without content validation, Validate reports no errors for this product.

Related to NASA-PDS/transform#2

@sslavney sslavney added bug Something isn't working triage-needed labels Feb 20, 2020
@jordanpadams
Copy link
Member

@sslavney apologies here. Validate has a known issue reading in files >2GB. we will work to update the software to better handle this error and hopefully come up with a solution.

@jordanpadams jordanpadams added high and removed high labels Feb 20, 2020
@josinde
Copy link

josinde commented Feb 25, 2020

Same issue for us using an older version (1.16.0-20190718-e5b39a1) and we confirm this is an issue with very large files. We are now integrating release 1.20.0. Do you want us to check this issue against this other version?

2020-02-05 04:05:19 INFO > Executing job with groupId = Packaging and jobId = ProductPackagerJob_1577bbfd-68f0-4d6f-b419-c69de6fa854c 2020-02-05 04:05:19 INFO Package temporary folder created: /tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516 2020-02-05 04:05:32 INFO Using validation style 'PDS4 Label' for location file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml 2020-02-05 04:05:32 INFO Starting validation task for location 'file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml' 2020-02-05 04:07:31 INFO Validation complete for location 'file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml' 2020-02-05 04:07:31 INFO PDS Validation Tool 1.16.0-20190718-e5b39a1 Report:

FAIL: file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1.xml

ERROR: error.table.bad_file_read - Error occurred while trying to read table: capacity < 0: (-1395781509 < 0).
file:/tmp/em16_packager_12473380658639804638/em16psa-pds4-pi-01-em16_tgo_acs-20200205T030519516/em16_tgo_acs/Commissioning_and_Verification/acs_raw_sc_tir_20180319T115959-20180319T155944-1425-1-BBBB.tab (line = 0, column = 0)

@josinde
Copy link

josinde commented Feb 26, 2020

Confirmed also as an issue for PDS Validation Tool 1.20.0. Cheers

@jordanpadams jordanpadams changed the title Validate error reading large binary table Validate error reading tables > 2GiB Feb 27, 2020
@jordanpadams
Copy link
Member

@sslavney @josinde as a note, this directly relates to NASA-PDS/transform#2 which also uses the underlying PDS4-JParser library. We will add this to our release plan for next build as this may require some significant overhaul of the underlying library

@msbentley
Copy link

msbentley commented Feb 27, 2020

Is there a "known" limit to the size of files that validate can handle? i.e. is it exactly 2GB? Just to be aware and programmatically skip such files if needs be (or disable content validation)

Also, does it affect all data types (i.e. Table_Character and Table_Binary as well etc.)

@jordanpadams
Copy link
Member

jordanpadams commented Feb 27, 2020

@mcayanan do you remember the details for this 2GB cap? or is it more ~2GB?

@mcayanan
Copy link
Contributor

@jordanpadams Unfortunately no. I would recommend in the code throwing a stacktrace at

https://github.com/NASA-PDS-Incubator/validate/blob/5f3b28c76a8f87787d6a502e94223294d83a6802/src/main/java/gov/nasa/pds/tools/validate/rule/pds4/TableDataContentValidationRule.java#L204

to see where the error is coming from. That might jog my brain cells. :)

@jordanpadams
Copy link
Member

@mcayanan

$ /Users/jpadams/Documents/proj/pds/pdsen/workspace/validate/validate-1.21.0-SNAPSHOT/bin/validate -t jgmess_160av01_shb.xml
...................................................java.lang.IllegalArgumentException
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
	at gov.nasa.pds.objectAccess.ByteWiseFileAccessor.<init>(ByteWiseFileAccessor.java:126)
	at gov.nasa.pds.objectAccess.TableReader.<init>(TableReader.java:126)
	at gov.nasa.pds.tools.validate.content.table.RawTableReader.<init>(RawTableReader.java:61)
	at gov.nasa.pds.tools.validate.rule.pds4.TableDataContentValidationRule.validateTableDataContents(TableDataContentValidationRule.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at gov.nasa.pds.tools.validate.rule.AbstractValidationRule.execute(AbstractValidationRule.java:63)
	at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:191)
	at gov.nasa.pds.tools.validate.task.ValidationTask.execute(ValidationTask.java:134)
	at gov.nasa.pds.tools.validate.task.BlockingTaskManager.submit(BlockingTaskManager.java:27)
	at gov.nasa.pds.tools.label.LocationValidator.validate(LocationValidator.java:163)
	at gov.nasa.pds.validate.ValidateLauncher.doValidation(ValidateLauncher.java:1226)
	at gov.nasa.pds.validate.ValidateLauncher.processMain(ValidateLauncher.java:1423)
	at gov.nasa.pds.validate.ValidateLauncher.main(ValidateLauncher.java:1466)
PDS Validate Tool Report

@jordanpadams
Copy link
Member

Looks like the buffer.allocate tries to allocate the entire size of the file, when it probably needs to be chunked

@mcayanan
Copy link
Contributor

mcayanan commented Mar 6, 2020

@jordanpadams Ya that looks to be the issue. Specifically, it's trying to allocate a total size of 2,687,074,568 bytes, which is greater than the max int value. I forgot exactly how large arrays (greater than 2GB) are being handled currently, but I would imagine the tool should be updated similarly on the large table end.

@jordanpadams
Copy link
Member

Thanks @mcayanan . I knew I saw us buffering somewhere else in the code so this just needs to be updated to do the same. Thanks for the tip!

@jordanpadams
Copy link
Member

Duplicate of NASA-PDS/pds4-jparser#21

@jordanpadams jordanpadams added the duplicate This issue or pull request already exists label May 22, 2020
@jordanpadams jordanpadams added medium and removed duplicate This issue or pull request already exists high labels Jul 30, 2020
@jordanpadams jordanpadams reopened this Jul 30, 2020
@jordanpadams jordanpadams added this to the PDS.26 (ends 2020-09-23) milestone Sep 8, 2020
@jordanpadams
Copy link
Member

@hhlee445 see comments above: #189 (comment) and #189 (comment)

@msbentley
Copy link

@mcayanan do you remember the details for this 2GB cap? or is it more ~2GB?

Hi @jordanpadams would it be possible to clarify this? We need to work around by programmatically skipping validation for products over this threshold, and it would be good to confirm the exact value!

@jordanpadams
Copy link
Member

@msbentley unfortunately this is a tough thing to test for exactness. @hhlee445 is in the process of implementing a fix as we speak, so if you can wait another week or 2, we may be able to use that version of the software.

@msbentley
Copy link

OK, thanks @jordanpadams and @hhlee445 - I can wait 👍

jordanpadams added a commit that referenced this issue Nov 18, 2020
issue #189: Fix to validate error reading tables larger than 2GB
@jordanpadams
Copy link
Member

jordanpadams commented Nov 18, 2020

@msbentley @sslavney this should now be fixed. feel free to try out the latest snapshot version of validate here:

https://github.com/NASA-PDS/validate/releases/tag/1.25.0-SNAPSHOT

thanks to @hhlee445 for the excellent work here!

Note: this update did not make it into Build 11.0 I&T, and since it is a pretty significant change to how we read in data files, I would prefer it be rigorously tested prior to an official release in the Spring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants