Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add ability to read math in PDF documents (#17276)
Fixes #9288 Summary of the issue: PDF 2.0 added Associated Files (AF). It also describes a method for Formula tags to make use of AF that contain MathML. The LaTeX Project (the group that maintains LaTeX) has released an update to LaTeX that uses this technique. Hence, there will soon be a large body of PDF documents generated from LaTeX (pdflatex and lualatex) that contain MathML. In conjunction with Foxit and an informal agreement with someone at Adobe, we agreed on a method to expose the MathML in an AF without a change to the PDF accessibility interface: the Formula tag gets role=Math (in windows, ROLE_SYSTEM_EQUATION) and the contents of tag is the MathML. Note: this does not change the legality of the previous method of fully tagging the PDF math with children elements pointing to subexpressions in the PDF. However, that method has proved difficult to implement for PDF generators. This method seems to be much simpler and hence will be used. The latest release of Foxit contains the support of AF with MathML. So far, Adobe has not made a change but with Foxit and NVDA supporting this, there will be more of an impetuses to do so. According to the Foxit implementer, it only took 1-2 days to implement. Description of user facing changes The math in documents will be spoken and brailled just as it is done for HTML documents. It will also be navigable. This should work with any of the MathML add-ons. Description of development approach Support required only about 3 lines to be added to the AdobeAcrobat.py file. I changed a few more lines to add debug warnings when various COM interfaces were not found. There was a commit in January 2024 that wiped out the MathML support in PDF in favor of alt text. This was in the .cpp file that is part of this PR. This PR mostly reverts that change. Alt text is still supported via the creation of a MathML `<mtext>` element. Potentially, this is a better solution because sometimes the alt text is LaTeX and LaTeX contains lots of punctuation characters that are not spoken by NVDA by default. Pushing this to the Math handler gives them the ability to override this behavior and speak all the characters. Currently MathCAT just passes the `mtext` content directly to NVDA, but I will look into making it smarter about that. Because Adobe Reader currently does not handle AFs, the alt text will get read if a formula has both an AF and alt text. Testing strategy: Here are two PDF files for testing: 1. [Several inline and display equations](https://github.com/user-attachments/files/17334945/mathml-AF-ex2.pdf) 2. [Some equations with alt text](https://github.com/user-attachments/files/17334946/formula-alt-text.pdf) Known issues with pull request: None
- Loading branch information