RTF to HTML converter in PHP
In a recent project, I desperately needed an RTF to HTML converter written in PHP. Googling around turned up some matches, but I could not get them to work properly. Also, one of them called passthru()
to use a RTF2HTML executable, which is something I didn’t want. I was looking for an RTF to HTML converter written purely in PHP.
Since I couldn’t find anything ready-made, I sat down and coded one up myself. It’s short, and it works, implementing the subset of RTF tags that you’ll need in HTML and ignoring the rest. As it turns out, the RTF format isn’t that complicated when you really look at it, but it isn’t something you code a parser for in 15 minutes either.
Include the file rtf-html-php.php somewhere in your project. Then do this:
$reader = new RtfReader();
$rtf = file_get_contents("test.rtf"); // or use a string
$result = $reader->Parse($rtf);
The parser will return TRUE if the RTF was parsed successfully, or FALSE if the RTF was malformed.
If you’d like to see what the parser read (for debug purposes), then call this (but only if the RTF was successfully parsed):
$reader->root->dump();
To convert the parser’s parse tree to HTML, call this (but only if the RTF was successfully parsed):
$formatter = new RtfHtml();
echo $formatter->Format($reader->root);
For enhanced compatibility the default character encoding of the converted RTF unicode characters is set to HTML-ENTITIES
. To change the default encoding, you can initialize the RtfHtml object with the desired encoding supported by mb_list_encodings()
: ex. UTF-8
$formatter = new RtfHtml('UTF-8');
composer require henck/rtf-to-html
- Please note that rtf-html-php requires your PHP installation to support the
mb_convert_encoding
function. Therefore you must have thephp-mbstring
module installed. For fresh PHP installations, it will usually be there.
- Adds support for Font table extraction.
- Adds support for Pictures.
- Adds support for additional control symbols.
- Updates the way the parser parses unicode and its replacement character(s).
- Updated Html formatter: now it reads the proper encoding from RTF document and/or from current font.
- Updated unicode conversion method: now it takes into account the right encoding of the Rtf document.
- Unicode characters are now fully supported
- Font color & background are now supported
- Better HTML tag handling
- Better display for text with altered font-size
- The RTF parser would either issue warnings or go into an infinite loop when parsing a malformed RTF. Instead, it now returns TRUE when parsing was successful, and FALSE if it was not.
- The RTF to HTML converter can now be installed through Composer (thanks to felixkiss).
- A bug causing control words to be misparsed occasionally is now fixed.
- Fixed bug: underlining would start but never end. Now it does.
- Feature request: images are now filtered out of the output.