README.md

Fix garbled text when extracting from PDF documents in C# and VB.NET

This sample shows how to extract text from PDF documents when regular methods produce garbled / unexpected text.

There are searchable PDF documents that look just fine. But it’s not possible to copy or extract text from them properly. Even by Adobe tools.

This happens when the document does not contain mappings of glyphs to Unicode characters. Or contains incorrect mappings.

There is the PdfTextExtractionOptions.UnmappedCharacterHandler property. This sample shows how to perform OCR for unmapped characters and then replace them with correct Unicode values.