Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw error/exception for non-implemented latex command? #32

Open
datamove opened this issue Mar 18, 2020 · 2 comments
Open

Throw error/exception for non-implemented latex command? #32

datamove opened this issue Mar 18, 2020 · 2 comments

Comments

@datamove
Copy link

Dear Philippe,
Thanks for your work!
I wonder if latex_to_text should throw an exception if it faces a latex command that it can not convert? For example, the following Latex code is a matrix:
\left (\begin{array}{llll}25 & 31 & 17 & 43\\75 & 94 & 53 & 132\\75 & 94 & 54 & 134\\25 & 32 & 20 & 48\end{array}\right )
and all I get is
< a r r a y >
but I'd like to catch a conversion erro rin order to deal with it.

Is there a way to do that?
Thanks in advance!

@phfaist
Copy link
Owner

phfaist commented Mar 18, 2020

The original philosophy behind latex_to_text has been to obtain as useful text as possible for the given latex input, which is the reason why latex_to_text does not fail for unknown macros and why it provides a placeholder text for some usual constructs like graphics/arrays. But the behavior is customizable, try:

from pylatexenc import macrospec, latexwalker, latex2text

def raise_l2t_unknown_latex(n):
    if n.isNodeType(latexwalker.LatexMacroNode):
        raise ValueError("Unknown macro: '\\{}'".format(n.macroname))
    elif n.isNodeType(latexwalker.LatexEnvironmentNode):
        raise ValueError("Unknown environment: '\\begin{{{}}}'".format(n.environmentname))
    raise ValueError("Unknown latex construct: '{}'".format(n.latex_verbatim()))

l2t_db = latex2text.get_default_latex_context_db()
l2t_db.add_context_category(
    'my-error-category',
    prepend=True,
    macros=[
        latex2text.MacroTextSpec('includegraphics', simplify_repl=raise_l2t_unknown_latex)
    ],
    environments=[
        latex2text.EnvironmentTextSpec('array', simplify_repl=raise_l2t_unknown_latex),
        latex2text.EnvironmentTextSpec('pmatrix', simplify_repl=raise_l2t_unknown_latex),
        latex2text.EnvironmentTextSpec('bmatrix', simplify_repl=raise_l2t_unknown_latex),
        latex2text.EnvironmentTextSpec('smallmatrix', simplify_repl=raise_l2t_unknown_latex),
    ]
)
l2t_db.set_unknown_macro_spec(
    latex2text.MacroTextSpec('', simplify_repl=raise_l2t_unknown_latex)
)
l2t_db.set_unknown_environment_spec(
    latex2text.EnvironmentTextSpec('', simplify_repl=raise_l2t_unknown_latex)
)


result = latex2text.LatexNodes2Text(latex_context=l2t_db).latex_to_text(r'''
\textbf{Hello} world.
\begin{equation}
  A = X + \begin{array}{cc}1 & 2\\3 & 4\end{array}
\end{equation}
And here is an undefined macro: \undefinedMacro.
''')

print(result)

Here, macros/environments "array", "includegraphics", etc. (which are those macros/environments that provide a <p l a c e h o l d e r>) have replacement functions that simply raise an error, and unknown macros/environments will also raise an error.

Thanks for the feedback & hope this helps.

@phfaist
Copy link
Owner

phfaist commented Nov 4, 2021

In the meantime, the code in my earlier comment can be simplified using context-db category choices from the default context db:

from pylatexenc import macrospec, latexwalker, latex2text

def raise_l2t_unknown_latex(n):
    if n.isNodeType(latexwalker.LatexMacroNode):
        raise ValueError("Unknown macro: '\\{}'".format(n.macroname))
    elif n.isNodeType(latexwalker.LatexEnvironmentNode):
        raise ValueError("Unknown environment: '\\begin{{{}}}'".format(n.environmentname))
    raise ValueError("Unknown latex construct: '{}'".format(n.latex_verbatim()))

l2t_db = latex2text.get_default_latex_context_db().filter_context(
    exclude_categories=['latex-placeholders',],
)
l2t_db.set_unknown_macro_spec(
    latex2text.MacroTextSpec('', simplify_repl=raise_l2t_unknown_latex)
)
l2t_db.set_unknown_environment_spec(
    latex2text.EnvironmentTextSpec('', simplify_repl=raise_l2t_unknown_latex)
)


result = latex2text.LatexNodes2Text(latex_context=l2t_db).latex_to_text(r'''
\textbf{Hello} world.

\begin{equation}
  A = X + \begin{array}{cc}1 & 2\\3 & 4\end{array}
\end{equation}

% raises an error:
And here is an undefined macro: \undefinedMacro.

% this would also raise an error:
\includegraphics{fig-test-graphics}
''')

print(result)
# raises error for \undefinedMacro. If you remove \undefinedMacro, raises
# an error for \includegraphics, which is only defined in the 'latex-placeholders'
# category in the default context db and which we've filtered out of our context db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants