Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect parsing of REVERSE SOLIDUS in literal string #154

Open
Greybird opened this issue Aug 20, 2024 · 0 comments
Open

Incorrect parsing of REVERSE SOLIDUS in literal string #154

Greybird opened this issue Aug 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Greybird
Copy link

Reporting an Issue Here

When parsing some files, I noticed some Info Elements are showing incorrect values.
For example, for this file, the Producer tag:

  • is shown by Acrobat as C48x Series (PDF - 300X300 dpi).
    image
  • is parsed by PDFSharp as C48x Series (DF - 300X300 dpi) (missing P)

Expected Behavior

When parsing literal string, when a REVERSE SOLIDUS is encountered with an immediate following character not part of Table 3 of 7.3.4.2 paragraph of ISO/DIS 32000-2, the REVERSE SOLIDUS should be ignored, but the following character should be kept.

Actual Behavior

When parsing literal string, when a REVERSE SOLIDUS is encountered with an immediate following character not part of Table 3 of 7.3.4.2 paragraph of ISO/DIS 32000-2, the REVERSE SOLIDUS is ignored, as well as the following character.

Steps to Reproduce the Behavior

[Fact]
public void ReverseSolidus_with_invalid_following_character_should_be_ignored()
{
    using var doc = PdfReader.Open(@"Cover-letter-4098208.pdf");
    var producer = doc.Info.Producer;
    producer.Should().Be("C48x Series (PDF - 300X300 dpi)");
}

Expected producer to be "C48x Series (PDF - 300X300 dpi)" with a length of 31, but "C48x Series (DF - 300X300 dpi)" has a length of 30, differs near "DF " (index 13).

The issue is most probably linked to an open question in the specification interpretation, as explained in this comment of Lexer.cs

@ThomasHoevel ThomasHoevel added the bug Something isn't working label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants