-
Notifications
You must be signed in to change notification settings - Fork 13
Home
MathML Canonicalizer is a tool which performs canonicalization of MathML expressions. Original version can be found here.
Goal of this project is to create an application in Java language which performs canonicalization of mathematical expressions written in MathML (Mathematical Markup Language).
The output should be canonical form of given MathML document. This canonicalized form of MathML can then be used for easy decision if two differently written MathML formulae represent the same expression, or by MathML search and comparison engines.
The functionality of canonicalizer is divided into modules. MathMLCanonicalizer class can be initialized using XML configuration or manually by adding initialized modules or used with default settings stored in property files. Then it takes input stream with MathML document and produces canonicalized output stream. Class Settings provides static helper methods and loads global settings. MathMLCanonicalizerCommandLineTool is the runnable class connecting canonicalizer with command line interface.
Usage:
java -jar mathml-canonicalizer.jar [ -c </path/to/config.xml> ] [ -w ] [ -d ] </path/to/input>...
java -jar mathml-canonicalizer.jar -p | --print-default-config-file
java -jar mathml-canonicalizer.jar -h | --help
NB: </path/to/input> is /path/to/file.xhtml or /path/to/directory
Options:
-c,--config-file <arg> Load configuration file.
-d,--inject-xhtml-mathml-svg-dtd Enforce injection of XHTML 1.1
plus MathML 2.0 plus SVG 1.1 DTD
reference into input documents.
-h,--help Print help (this screen).
-p,--print-default-config-file Print default configuration that
will be used if no config file
is supplied.
-w,--overwrite-inputs Overwrite input files by
produced canonical outputs.
On Windows, file encoding defaults to system-language-specific single-byte
encoding. To ensure JVM uses UTF-8 start JVM with command line argument
-Dfile.encoding=UTF-8
:
java -Dfile.encoding=UTF-8 -jar mathml-canonicalizer.jar
However, be aware the default Windows command line shell has significant
problems with Unicode in the default configuration. Try Lucida console font with
appropriate shell code page setting via chcp 65001
command.
-
Michal Růžička
- project leading
- MIR tools integration
-
David Formánek
- architecture
- class for module testing and some tests
- MfencedReplacer tests and implementation
-
Jakub Adler
- MrowNormalizer tests and implementation
-
Jaroslav Dufek
- OperatorNormalizer tests and implementation
- ScriptNormalizer tests and implementation
-
Robert Šiška
- XML properties loading
- CLI and GUI
- ElementMinimizer improvements
MathMLCan's code is licensed under the terms of the Apache License, Version 2.0.