Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NPE when iterating over an input split in CompositeRecordReader.java #436

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

venugit
Copy link

@venugit venugit commented Apr 12, 2015

When iterating over input splits via DeprecatedInputFormatWrapper, DeprecatedInputFormatWrapper.java always calls mifcReader.setKeyValue(key, value) before nextValue is invoked which can call through to setKeyValue in CompositeRecordReader.java. setKeyValue requires that the currentRecordReader instance be non-null; however currentRecordReader is set to null in line 113 at the end of every input split, leading to an NPE with the next call to setKeyValue after the end of an input split.

This patch address the situation by having the setKeyValue method doing a null check for currentRecordReader and in the case it is null, invoking nextKeyValue to see if there are any more elements to be found

this.value = value;
if (currentRecordReader == null) {
try {
if (!nextKeyValue()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we safely advance here? The interface suggests that setKeyValue will always be called before nextKeyValue, so calling nextKeyValue here could cause us to skip a record?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the interface offers a possibility of skipping over a record. Re-reading the code then, it might work to have setKeyValue simply set this.key and this.value when currentRecordReader is null; the next call to nextKeyValue will then invoke re-initialize currentRecordReader and invoke currentRecordReader.setKeyValue on the "cached" key/value.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from my reading that should work, this stuff is unfortunately hard to parse :/

@ianoc
Copy link
Contributor

ianoc commented Apr 20, 2015

Ping on this, got time to update as we discussed?

@venugit
Copy link
Author

venugit commented Apr 21, 2015

Hi, sorry for not getting back to you earlier. I tried out the update, and that fixes the issue through one set of splits. However, there needs to be a good way to persist the last set key/value pairs between instances of CompositeRecordReader, which I have not revisited.

Here is the stack trace seen where the solution I outlined is tried:

Caused by: java.io.IOException: The RecordReader returned a key and value that do not match the key and value sent to it. This means the RecordReader did not properly implement com.twitter.elephantbird.mapred.input.MapredInputFormatCompatible. Current reader class : class com.twitter.elephantbird.mapreduce.input.combine.CompositeRecordReader
at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.next(DeprecatedInputFormatWrapper.java:338)
at cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
at cascading.scheme.hadoop.SequenceFile.source(SequenceFile.java:93)

I'm going to spend some time this week on this.

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Venugopal Gummuluru seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants