Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PDF/A-1b #1029

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ drop_to_console.rb
/bin
.DS_Store
*.pdf
/.byebug_history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into your global .gitignore. This is not a Prawn dep.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

3 changes: 2 additions & 1 deletion lib/prawn/document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ class Document
:page_size, :page_layout, :margin, :left_margin,
:right_margin, :top_margin, :bottom_margin, :skip_page_creation,
:compress, :background, :info,
:text_formatter, :print_scaling
:text_formatter, :print_scaling,
:trailer, :enable_pdfa_1b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather make the ID deterministic so that we didn't have to make trailer accessible here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

].freeze

# Any module added to this array will be included into instances of
Expand Down
55 changes: 55 additions & 0 deletions lib/prawn/vera_pdf.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
require 'nokogiri'
require 'open3'

module Prawn
module VeraPdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in specs, so it should live in specs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to comment on each my comment as long as you push granular commits. GitHub hides comments on the code that has been changed and lets reviewing only new changes. It also sends emails about new commits (not about force pushes, unfortunately, so please let me know about those if you want someone to look at those).

Just a hint to save a few minutes for you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more used to GitLab which lets you 'resolve' a discussion manually. I use my 'dones' here primarily for keeping track of what I still need to do. For example your comment about making sure that the CI runs the veraPDF specs is now hidden as 'outdated' because I moved the file elsewhere. As long as you are not annoyed with getting emails for every 'done' I would continue with this practice. Or is there some GitHub trick for such things? 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is there some GitHub trick for such things?

Not that I know of. Would love to have a manual option like you described.

As long as you are not annoyed with getting emails for every 'done' I would continue with this practice.

Not at all. It absolutely makes sense since I left quite a few comments and it's hard to keep track otherwise.

VERA_PDF_EXECUTABLE = 'verapdf'.freeze
VERA_PDF_COMMAND = "#{VERA_PDF_EXECUTABLE} --flavour 1b --format xml".freeze

def which(cmd)
exts = ENV['PATHEXT'] ? ENV['PATHEXT'].split(';') : ['']
ENV['PATH'].split(File::PATH_SEPARATOR).each do |path|
exts.each do |ext|
exe = File.join(path, "#{cmd}#{ext}")
return exe if File.executable?(exe) && !File.directory?(exe)
end
end
return nil
end

def vera_pdf_available?
which VERA_PDF_EXECUTABLE
end

def valid_pdfa_1b?(pdf_data)
stdout, stderr, status = Open3.capture3(VERA_PDF_COMMAND, stdin_data: pdf_data)
raise Exception, "VeraPDF could not be run. #{stderr}" unless status.success?

reported_as_compliant? stdout.lines[4..-1].join
end

def reported_as_compliant?(xml_data)
xml_doc = Nokogiri::XML xml_data
raise Exception, 'The veraPDF xml report was not well formed.' unless xml_doc.errors.empty?

xml_doc.remove_namespaces!
validation_result = xml_doc.xpath('/processorResult/validationResult')
assertions = validation_result.xpath('assertions/assertion')
assertions.each do |assertion|
message = assertion.at_xpath('message').content
clause = assertion.at_xpath('ruleId').attribute('clause').content
test = assertion.at_xpath('ruleId').attribute('testNumber').content
context = assertion.at_xpath('location/context').content
url = 'https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules'
url_anchor = "rule-#{clause.delete('.')}-#{test}"
puts
puts 'PDF/A-1b VIOLATION'
puts " Message: #{message}"
puts " Context: #{context}"
puts " Details: #{url}##{url_anchor}"
puts
end
validation_result.attribute('isCompliant').content == 'true'
end
end
end
7 changes: 6 additions & 1 deletion manual/contents.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,14 @@ def prawn_manual_document
old_default_external_encoding = Encoding.default_external
Encoding.default_external = Encoding::UTF_8

# We need to use a fixed trailer ID, otherwise the test for
# unintended manual changes will always trigger because of
# a random trailer ID.
trailer_id = PDF::Core::ByteString.new('PrawnPrawnPrawnP')
Prawn::ManualBuilder::Example.new(
skip_page_creation: true,
page_size: 'FOLIO'
page_size: 'FOLIO',
trailer: { ID: [trailer_id, trailer_id] }
) do
load_page '', 'cover'
load_page '', 'how_to_read_this_manual'
Expand Down
1 change: 1 addition & 0 deletions prawn.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Gem::Specification.new do |spec|
spec.add_development_dependency('pdf-reader', ['~> 1.4', '>= 1.4.1'])
spec.add_development_dependency('rubocop', '~> 0.47.1')
spec.add_development_dependency('rubocop-rspec', '~> 1.10')
spec.add_development_dependency('nokogiri', '~> 1.7')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no binary dependencies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into replacing the veraPDF report parsing with REXML.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


spec.homepage = 'http://prawnpdf.org'
spec.description = <<END_DESC
Expand Down
2 changes: 1 addition & 1 deletion spec/manual_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
MANUAL_HASH =
case RUBY_ENGINE
when 'ruby'
'b55f154c9093c60f38051c75920c8157c775b946b0c77ffafc0a8a634ad5401e8ceafd0b96942839f82bacd726a690af3fdd1fd9e185616f67c6c0edfcfd0460'
'c7202f015e36d02ac36dac38d88bb78a4dd439ec6d23268ebddaa15a8bcf7e790f203fd3e92d9c1b92c1a2806a03d7f5706c1550da29f281d25bb5540568445e'
when 'jruby'
'd2eb71ea3ddc35acb185de671a6fa48862ebad5727ce372e3a742f45d31447765c4004fbe5fbfdc1f5a32903ac87182c75e6abe021ab003c8af6e6cc33e0d01e'
end
Expand Down
16 changes: 11 additions & 5 deletions spec/prawn/document_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,13 @@ def self.format(string)
end

it 'is idempotent' do
pdf = described_class.new
# We need to overwrite the trailer ID, otherwise each render
# pass will generate a new random ID and the documents would
# not match.
trailer_id = PDF::Core::ByteString.new(SecureRandom.random_bytes(16))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this doesn't happen without any effort on the users part.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pdf = described_class.new(trailer: {
ID: [trailer_id, trailer_id]
})

contents = pdf.render
contents2 = pdf.render
Expand Down Expand Up @@ -508,18 +514,18 @@ def self.format(string)
end

describe 'content stream characteristics' do
it 'has 1 single content stream for a single page PDF' do
it 'has 2 content streams for a single page PDF' do
pdf = described_class.new
pdf.text 'James'
output = StringIO.new(pdf.render)
hash = PDF::Reader::ObjectHash.new(output)

streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) }

expect(streams.size).to eq(1)
expect(streams.size).to eq(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

end

it 'has 1 single content stream for a single page PDF, even if go_to_page '\
it 'has 2 content streams for a single page PDF, even if go_to_page '\
'is used' do
pdf = described_class.new
pdf.text 'James'
Expand All @@ -530,7 +536,7 @@ def self.format(string)

streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) }

expect(streams.size).to eq(1)
expect(streams.size).to eq(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't change either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

end
end

Expand Down
11 changes: 11 additions & 0 deletions spec/prawn/pdfa_1b_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
require 'spec_helper'
require 'prawn/vera_pdf'

include Prawn::VeraPdf

if vera_pdf_available?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to let developers know whats's wrong. But please make sure CI has all tools installed to actually run the specs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

require_relative 'pdfa_1b_spec_impl'
else
puts 'NOTICE: Specs for PDF/A-1b are not run, because veraPDF ' \
'binary was not found in path.'
end
31 changes: 31 additions & 0 deletions spec/prawn/pdfa_1b_spec_impl.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
require 'spec_helper'
require 'prawn/vera_pdf'

describe Prawn::Document do
include Prawn::VeraPdf

let(:pdf) { described_class.new(enable_pdfa_1b: true) }

describe 'PDF/A 1b conformance' do
it 'empty document' do
expect(valid_pdfa_1b?(pdf.render)).to be true
end

it 'document with some text' do
pdf.font_families.update(
'DejaVuSans' => {
normal: "#{Prawn::DATADIR}/fonts/DejaVuSans.ttf"
}
)
pdf.font 'DejaVuSans' do
pdf.text_box 'Some text', at: [100, 100]
end
expect(valid_pdfa_1b?(pdf.render)).to be true
end

it 'document with some image' do
pdf.image "#{Prawn::DATADIR}/images/pigs.jpg"
expect(valid_pdfa_1b?(pdf.render)).to be true
end
end
end
2 changes: 1 addition & 1 deletion spec/prawn/stamp_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@
next unless obj =~ %r{/Type /Page$}
# The page object must contain the annotation reference
# to render a clickable link
expect(obj).to match(%r{^/Annots \[\d \d .\]$})
expect(obj).to match(%r{^/Annots \[\d+ \d .\]$})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the additional object for the XMP metadata stream the object number for the annotation object switched from single digit to double digit (from 9 to 10). This regex only tested for single digit object number. If we make the XMP metadata stream optional, this change can be reverted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

end
end

Expand Down