Add support for PDF/A-1b #1029

Backbone81 · 2017-06-07T08:16:47Z

This pull request adds support for PDF/A-1b compliant documents. The need for such a feature was already stated a few years ago (Google Groups).

This pull request needs prawnpdf/pdf-core#34

Those features were missing for PDF/A-1b and were added:

Trailer ID (each document now has an ID based on a body hash)
XMP metadata which is synchronized with the document information dictionary (only added to documents which are marked with the PDF/A-1b option)
OutputIntent with an ICC profile (only added to documents which are marked with the PDF/A-1b option)

I used veraPDF to validate for PDF/A-1b conformance.

Backbone81 · 2017-07-12T08:20:56Z

@pointlessone can you have a look at my pull request? Any thoughts about what to change to get this feature integrated?

pointlessone · 2017-07-12T10:02:02Z

.gitignore

@@ -10,3 +10,4 @@ drop_to_console.rb
 /bin
 .DS_Store
 *.pdf
+/.byebug_history


This should go into your global .gitignore. This is not a Prawn dep.

pointlessone · 2017-07-12T10:06:44Z

lib/prawn/document.rb

@@ -66,7 +66,8 @@ class Document
      :page_size, :page_layout, :margin, :left_margin,
      :right_margin, :top_margin, :bottom_margin, :skip_page_creation,
      :compress, :background, :info,
-      :text_formatter, :print_scaling
+      :text_formatter, :print_scaling,
+      :trailer, :enable_pdfa_1b


I would rather make the ID deterministic so that we didn't have to make trailer accessible here.

pointlessone · 2017-07-12T10:06:54Z

lib/prawn/vera_pdf.rb

+require 'open3'
+
+module Prawn
+  module VeraPdf


This is only used in specs, so it should live in specs.

You don't need to comment on each my comment as long as you push granular commits. GitHub hides comments on the code that has been changed and lets reviewing only new changes. It also sends emails about new commits (not about force pushes, unfortunately, so please let me know about those if you want someone to look at those).

Just a hint to save a few minutes for you.

I am more used to GitLab which lets you 'resolve' a discussion manually. I use my 'dones' here primarily for keeping track of what I still need to do. For example your comment about making sure that the CI runs the veraPDF specs is now hidden as 'outdated' because I moved the file elsewhere. As long as you are not annoyed with getting emails for every 'done' I would continue with this practice. Or is there some GitHub trick for such things? 😁

Or is there some GitHub trick for such things?

Not that I know of. Would love to have a manual option like you described.

As long as you are not annoyed with getting emails for every 'done' I would continue with this practice.

Not at all. It absolutely makes sense since I left quite a few comments and it's hard to keep track otherwise.

pointlessone · 2017-07-12T10:08:58Z

spec/prawn/pdfa_1b_spec.rb

+
+include Prawn::VeraPdf
+
+if vera_pdf_available?


It's nice to let developers know whats's wrong. But please make sure CI has all tools installed to actually run the specs.

pointlessone · 2017-07-12T10:10:00Z

spec/prawn/document_spec.rb

      pdf = described_class.new
      pdf.text 'James'
      output = StringIO.new(pdf.render)
      hash = PDF::Reader::ObjectHash.new(output)

      streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) }

-      expect(streams.size).to eq(1)
+      expect(streams.size).to eq(2)


This shouldn't change.

pointlessone · 2017-07-12T10:10:16Z

spec/prawn/document_spec.rb

@@ -530,7 +536,7 @@ def self.format(string)

      streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) }

-      expect(streams.size).to eq(1)
+      expect(streams.size).to eq(2)


This shouldn't change either.

pointlessone · 2017-07-12T10:11:42Z

spec/prawn/document_spec.rb

+      # We need to overwrite the trailer ID, otherwise each render
+      # pass will generate a new random ID and the documents would
+      # not match.
+      trailer_id = PDF::Core::ByteString.new(SecureRandom.random_bytes(16))


Please make sure this doesn't happen without any effort on the users part.

pointlessone · 2017-07-12T10:13:37Z

spec/prawn/stamp_spec.rb

@@ -95,7 +95,7 @@
        next unless obj =~ %r{/Type /Page$}
        # The page object must contain the annotation reference
        # to render a clickable link
-        expect(obj).to match(%r{^/Annots \[\d \d .\]$})
+        expect(obj).to match(%r{^/Annots \[\d+ \d .\]$})


Why this change is needed?

With the additional object for the XMP metadata stream the object number for the annotation object switched from single digit to double digit (from 9 to 10). This regex only tested for single digit object number. If we make the XMP metadata stream optional, this change can be reverted.

pointlessone · 2017-07-12T10:15:03Z

prawn.gemspec

@@ -46,6 +46,7 @@ Gem::Specification.new do |spec|
  spec.add_development_dependency('pdf-reader', ['~> 1.4', '>= 1.4.1'])
  spec.add_development_dependency('rubocop', '~> 0.47.1')
  spec.add_development_dependency('rubocop-rspec', '~> 1.10')
+  spec.add_development_dependency('nokogiri', '~> 1.7')


Please no binary dependencies.

I will look into replacing the veraPDF report parsing with REXML.

pointlessone · 2017-07-13T10:23:18Z

lib/prawn/document.rb

@@ -384,7 +384,7 @@ def render(*a, &b)
    #   pdf.render_file "foo.pdf"
    #
    def render_file(filename)
-      File.open(filename, 'wb') { |f| render(f) }
+      File.open(filename, 'rb+') { |f| render(f) }


Why this change?

A file opened with 'wb' does not allow reading. My seek-solution reads the rendered body to avoid a second render pass. As you already pointed pointed out the flaw with seeking, I will revert this change.

pointlessone · 2017-07-14T09:47:55Z

spec/prawn/vera_pdf.rb

@@ -1,4 +1,4 @@
-require 'nokogiri'


We have spec/extensions dir for things like this. It probably should be renamed to helpers but that's how it is. For historical reasons.

Please put this file in that dir.

Done. It is better located in extensions. No need to require the veraPDF helpers any more. 👍

Backbone81 · 2017-07-15T13:34:19Z

@pointlessone I think I have addressed your comments so far.

How do you want to deal with rubocop messages? My own projects usually have a policy of only merging when all rubocop messages have been cleared. But I see that rubocop is also complaining about files I have not touched with my own pull request.

Added support for PDF/A-1b

4ea556f

Backbone81 mentioned this pull request Jun 7, 2017

Add support for PDF/A-1b prawnpdf/pdf-core#34

Open

pointlessone requested changes Jul 12, 2017

View reviewed changes

Backbone81 added 2 commits July 12, 2017 14:38

Moved some ignores from local ignore file to global ignore file.

eb9b667

Made trailer ID deterministic

8296035

pointlessone reviewed Jul 13, 2017

View reviewed changes

Backbone81 added 2 commits July 13, 2017 15:18

Trailer ID generation does not use seek any more.

bd70bcd

Replaced nokogiri with REXML for parsing the veraPDF report.

d575d6e

pointlessone reviewed Jul 14, 2017

View reviewed changes

Backbone81 added 4 commits July 14, 2017 12:19

Moved veraPDF helpers into the extensions directory.

54cb87d

XMP metadata is only added to PDF/A-1b compliant documents

38ae30c

Updated Travis CI config to include veraPDF

4dc6db8

Fixed environemnt variables for Travis CI

d929572

mojavelinux mentioned this pull request May 28, 2018

Support for PDF/X-1a asciidoctor/asciidoctor-pdf#911

Closed

mojavelinux mentioned this pull request Mar 20, 2020

Support for PDF/X-1a and X3 compatible PDF documents asciidoctor/asciidoctor-pdf#125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PDF/A-1b #1029

Add support for PDF/A-1b #1029

Backbone81 commented Jun 7, 2017 •

edited

Loading

Backbone81 commented Jul 12, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 12, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 13, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 14, 2017

pointlessone Jul 14, 2017

Backbone81 Jul 14, 2017

pointlessone Jul 14, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 15, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 14, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 14, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 13, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 12, 2017

Backbone81 Jul 14, 2017

pointlessone Jul 12, 2017

Backbone81 Jul 12, 2017

Backbone81 Jul 14, 2017

pointlessone Jul 13, 2017

Backbone81 Jul 13, 2017

Backbone81 Jul 13, 2017

pointlessone Jul 14, 2017

Backbone81 Jul 14, 2017

Backbone81 commented Jul 15, 2017

Add support for PDF/A-1b #1029

Are you sure you want to change the base?

Add support for PDF/A-1b #1029

Conversation

Backbone81 commented Jun 7, 2017 • edited Loading

Backbone81 commented Jul 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Backbone81 commented Jul 15, 2017

Backbone81 commented Jun 7, 2017 •

edited

Loading