Skip to content
Michal Růžička edited this page Dec 14, 2015 · 23 revisions

MathMLCan

MathML Canonicalizer is a tool which performs canonicalization of MathML expressions. Original version can be found here.

Introduction

Goal of this project is to create an application in Java language which performs canonicalization of mathematical expressions written in MathML (Mathematical Markup Language).

The output should be canonical form of given MathML document. This canonicalized form of MathML can then be used for easy decision if two differently written MathML formulae represent the same expression, or by MathML search and comparison engines.

Architecture

The functionality of canonicalizer is divided into modules. MathMLCanonicalizer class can be initialized using XML configuration or manually by adding initialized modules or used with default settings stored in property files. Then it takes input stream with MathML document and produces canonicalized output stream. Class Settings provides static helper methods and loads global settings. MathMLCanonicalization is the runnable class connecting canonicalizer with command line and graphical interface.

Invocation

Usage:
	java -jar mathml-canonicalizer.jar [ -c /path/to/config.xml ] [ -w ] [ -d ] { /path/to/input.xhtml | /path/to/directory [ | ... ] }
	java -jar mathml-canonicalizer.jar -p
	java -jar mathml-canonicalizer.jar -h
Options:
        -c,--config-file <arg>                  load configuration file
        -d,--inject-xhtml-mathml-svg-dtd        enforce injection of XHTML 1.1
                                                plus MathML 2.0 plus SVG 1.1 DTD
                                                reference into input documents
        -h,--help                               print help
        -p,--print-default-config-file          print default configuration that
                                                will be used if no config file
                                                is supplied
        -w,--overwrite-inputs                   overwrite input files by
                                                canonical outputs

When no input files are given, the GUI will be started.

File encoding on Windows

On Windows, file encoding defaults to system-language-specific single-byte encoding. To ensure JVM uses UTF-8 start JVM with command line argument -Dfile.encoding=UTF-8:

java -Dfile.encoding=UTF-8 -jar mathml-canonicalizer.jar

However, be aware the default Windows command line shell has significant problems with Unicode in the default configuration. Try Lucida console font with appropriate shell code page setting via chcp 65001 command.

Contributors

  • Michal Růžička
  • David Formánek
    • architecture
    • class for module testing and some tests
    • MfencedReplacer tests and implementation
  • Jakub Adler
    • MrowNormalizer tests and implementation
  • Jaroslav Dufek
    • OperatorNormalizer tests and implementation
    • ScriptNormalizer tests and implementation
  • Robert Šiška
    • XML properties loading
    • CLI and GUI
    • ElementMinimizer improvements

Licence

MathMLCan's code is licensed under the terms of the Apache License, Version 2.0.

Clone this wiki locally