Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WstxEOFException: Unexpected EOF in prolog #34

Open
DavidLuptak opened this issue May 9, 2018 · 0 comments
Open

WstxEOFException: Unexpected EOF in prolog #34

DavidLuptak opened this issue May 9, 2018 · 0 comments

Comments

@DavidLuptak
Copy link
Member

Hello, I am getting a strange exception when trying to index some documents. The exception is as follows:

May 09, 2018 8:42:41 PM cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer execute
SEVERE: error while parsing the input file. 
com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
	at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
	at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.minimizeElements(ElementMinimizer.java:134)
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:84)
	at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.executeStreamModules(MathMLCanonicalizer.java:375)
	at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.canonicalize(MathMLCanonicalizer.java:326)
	at cz.muni.fi.mias.math.MathTokenizer.parseMathML(MathTokenizer.java:304)
	at cz.muni.fi.mias.math.MathTokenizer.processFormulae(MathTokenizer.java:280)
	at cz.muni.fi.mias.math.MathTokenizer.reset(MathTokenizer.java:246)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:613)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1500)
	at cz.muni.fi.mias.indexing.Indexing.indexDocsThreaded(Indexing.java:145)
	at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:89)
	at cz.muni.fi.mias.MIaS.main(MIaS.java:39)

I run a command:

java -jar MIaS-1.6.6-4.10.4-SNAPSHOT.jar -conf ~/sandbox/mias/mias.properties -overwrite ~/sandbox/mias/data/samples/sample-mathml.xhtml ~/sandbox/mias/data/samples

My setup is:

MathMLCan: develop branch
MIaS: master branch
MIaSMath: master branch

$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

The mias.properties configuration is:

INDEXDIR=~/sandbox/mias/indexes/index-0
UPDATE=false
THREADS=16
MAXRESULTS=10000
DOCLIMIT=-1
FORMULA_DOCUMENTS=true

The sample-mathml.xhtml file is as simple as:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
  </head>
  <body>
    <math>
      <mfrac linethickness="1">
        <!-- numerator -->
        <mrow>
          <mi> x </mi>
          <mo> + </mo>
          <mi mathcolor="red"> y </mi>
          <mo> + </mo>
          <mi> z </mi>
        </mrow>
        <!-- denominator -->
        <mrow>
          <mi> x </mi>
          <mphantom>
            <mo> + </mo>
            <mi> y </mi>
          </mphantom>
          <mo> + </mo>
          <mi> z </mi>
        </mrow>
      </mfrac>
      <mfenced open=":" close="?">
      </mfenced>
    </math>
  </body>
</html>

Is there please any help out there or any ideas what could be a problem here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant