Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PDF/A-1b #1029

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Backbone81
Copy link

@Backbone81 Backbone81 commented Jun 7, 2017

This pull request adds support for PDF/A-1b compliant documents. The need for such a feature was already stated a few years ago (Google Groups).

This pull request needs prawnpdf/pdf-core#34

Those features were missing for PDF/A-1b and were added:

  • Trailer ID (each document now has an ID based on a body hash)
  • XMP metadata which is synchronized with the document information dictionary (only added to documents which are marked with the PDF/A-1b option)
  • OutputIntent with an ICC profile (only added to documents which are marked with the PDF/A-1b option)

I used veraPDF to validate for PDF/A-1b conformance.

@Backbone81
Copy link
Author

@pointlessone can you have a look at my pull request? Any thoughts about what to change to get this feature integrated?

.gitignore Outdated
@@ -10,3 +10,4 @@ drop_to_console.rb
/bin
.DS_Store
*.pdf
/.byebug_history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into your global .gitignore. This is not a Prawn dep.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -66,7 +66,8 @@ class Document
:page_size, :page_layout, :margin, :left_margin,
:right_margin, :top_margin, :bottom_margin, :skip_page_creation,
:compress, :background, :info,
:text_formatter, :print_scaling
:text_formatter, :print_scaling,
:trailer, :enable_pdfa_1b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather make the ID deterministic so that we didn't have to make trailer accessible here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

require 'open3'

module Prawn
module VeraPdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in specs, so it should live in specs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to comment on each my comment as long as you push granular commits. GitHub hides comments on the code that has been changed and lets reviewing only new changes. It also sends emails about new commits (not about force pushes, unfortunately, so please let me know about those if you want someone to look at those).

Just a hint to save a few minutes for you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more used to GitLab which lets you 'resolve' a discussion manually. I use my 'dones' here primarily for keeping track of what I still need to do. For example your comment about making sure that the CI runs the veraPDF specs is now hidden as 'outdated' because I moved the file elsewhere. As long as you are not annoyed with getting emails for every 'done' I would continue with this practice. Or is there some GitHub trick for such things? 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is there some GitHub trick for such things?

Not that I know of. Would love to have a manual option like you described.

As long as you are not annoyed with getting emails for every 'done' I would continue with this practice.

Not at all. It absolutely makes sense since I left quite a few comments and it's hard to keep track otherwise.


include Prawn::VeraPdf

if vera_pdf_available?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to let developers know whats's wrong. But please make sure CI has all tools installed to actually run the specs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pdf = described_class.new
pdf.text 'James'
output = StringIO.new(pdf.render)
hash = PDF::Reader::ObjectHash.new(output)

streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) }

expect(streams.size).to eq(1)
expect(streams.size).to eq(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -530,7 +536,7 @@ def self.format(string)

streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) }

expect(streams.size).to eq(1)
expect(streams.size).to eq(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't change either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# We need to overwrite the trailer ID, otherwise each render
# pass will generate a new random ID and the documents would
# not match.
trailer_id = PDF::Core::ByteString.new(SecureRandom.random_bytes(16))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this doesn't happen without any effort on the users part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -95,7 +95,7 @@
next unless obj =~ %r{/Type /Page$}
# The page object must contain the annotation reference
# to render a clickable link
expect(obj).to match(%r{^/Annots \[\d \d .\]$})
expect(obj).to match(%r{^/Annots \[\d+ \d .\]$})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the additional object for the XMP metadata stream the object number for the annotation object switched from single digit to double digit (from 9 to 10). This regex only tested for single digit object number. If we make the XMP metadata stream optional, this change can be reverted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

prawn.gemspec Outdated
@@ -46,6 +46,7 @@ Gem::Specification.new do |spec|
spec.add_development_dependency('pdf-reader', ['~> 1.4', '>= 1.4.1'])
spec.add_development_dependency('rubocop', '~> 0.47.1')
spec.add_development_dependency('rubocop-rspec', '~> 1.10')
spec.add_development_dependency('nokogiri', '~> 1.7')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no binary dependencies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into replacing the veraPDF report parsing with REXML.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -384,7 +384,7 @@ def render(*a, &b)
# pdf.render_file "foo.pdf"
#
def render_file(filename)
File.open(filename, 'wb') { |f| render(f) }
File.open(filename, 'rb+') { |f| render(f) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A file opened with 'wb' does not allow reading. My seek-solution reads the rendered body to avoid a second render pass. As you already pointed pointed out the flaw with seeking, I will revert this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@@ -1,4 +1,4 @@
require 'nokogiri'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have spec/extensions dir for things like this. It probably should be renamed to helpers but that's how it is. For historical reasons.

Please put this file in that dir.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It is better located in extensions. No need to require the veraPDF helpers any more. 👍

@Backbone81
Copy link
Author

@pointlessone I think I have addressed your comments so far.

How do you want to deal with rubocop messages? My own projects usually have a policy of only merging when all rubocop messages have been cleared. But I see that rubocop is also complaining about files I have not touched with my own pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants