-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PDF/A-1b #1029
base: master
Are you sure you want to change the base?
Add support for PDF/A-1b #1029
Changes from 3 commits
4ea556f
eb9b667
8296035
bd70bcd
d575d6e
54cb87d
38ae30c
4dc6db8
d929572
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
require 'nokogiri' | ||
require 'open3' | ||
|
||
module Prawn | ||
module VeraPdf | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is only used in specs, so it should live in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to comment on each my comment as long as you push granular commits. GitHub hides comments on the code that has been changed and lets reviewing only new changes. It also sends emails about new commits (not about force pushes, unfortunately, so please let me know about those if you want someone to look at those). Just a hint to save a few minutes for you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am more used to GitLab which lets you 'resolve' a discussion manually. I use my 'dones' here primarily for keeping track of what I still need to do. For example your comment about making sure that the CI runs the veraPDF specs is now hidden as 'outdated' because I moved the file elsewhere. As long as you are not annoyed with getting emails for every 'done' I would continue with this practice. Or is there some GitHub trick for such things? 😁 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not that I know of. Would love to have a manual option like you described.
Not at all. It absolutely makes sense since I left quite a few comments and it's hard to keep track otherwise. |
||
VERA_PDF_EXECUTABLE = 'verapdf'.freeze | ||
VERA_PDF_COMMAND = "#{VERA_PDF_EXECUTABLE} --flavour 1b --format xml".freeze | ||
|
||
def which(cmd) | ||
exts = ENV['PATHEXT'] ? ENV['PATHEXT'].split(';') : [''] | ||
ENV['PATH'].split(File::PATH_SEPARATOR).each do |path| | ||
exts.each do |ext| | ||
exe = File.join(path, "#{cmd}#{ext}") | ||
return exe if File.executable?(exe) && !File.directory?(exe) | ||
end | ||
end | ||
return nil | ||
end | ||
|
||
def vera_pdf_available? | ||
which VERA_PDF_EXECUTABLE | ||
end | ||
|
||
def valid_pdfa_1b?(pdf_data) | ||
stdout, stderr, status = Open3.capture3(VERA_PDF_COMMAND, stdin_data: pdf_data) | ||
raise Exception, "VeraPDF could not be run. #{stderr}" unless status.success? | ||
|
||
reported_as_compliant? stdout.lines[4..-1].join | ||
end | ||
|
||
def reported_as_compliant?(xml_data) | ||
xml_doc = Nokogiri::XML xml_data | ||
raise Exception, 'The veraPDF xml report was not well formed.' unless xml_doc.errors.empty? | ||
|
||
xml_doc.remove_namespaces! | ||
validation_result = xml_doc.xpath('/processorResult/validationResult') | ||
assertions = validation_result.xpath('assertions/assertion') | ||
assertions.each do |assertion| | ||
message = assertion.at_xpath('message').content | ||
clause = assertion.at_xpath('ruleId').attribute('clause').content | ||
test = assertion.at_xpath('ruleId').attribute('testNumber').content | ||
context = assertion.at_xpath('location/context').content | ||
url = 'https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules' | ||
url_anchor = "rule-#{clause.delete('.')}-#{test}" | ||
puts | ||
puts 'PDF/A-1b VIOLATION' | ||
puts " Message: #{message}" | ||
puts " Context: #{context}" | ||
puts " Details: #{url}##{url_anchor}" | ||
puts | ||
end | ||
validation_result.attribute('isCompliant').content == 'true' | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,6 +46,7 @@ Gem::Specification.new do |spec| | |
spec.add_development_dependency('pdf-reader', ['~> 1.4', '>= 1.4.1']) | ||
spec.add_development_dependency('rubocop', '~> 0.47.1') | ||
spec.add_development_dependency('rubocop-rspec', '~> 1.10') | ||
spec.add_development_dependency('nokogiri', '~> 1.7') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please no binary dependencies. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will look into replacing the veraPDF report parsing with REXML. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
spec.homepage = 'http://prawnpdf.org' | ||
spec.description = <<END_DESC | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -455,7 +455,6 @@ def self.format(string) | |
|
||
it 'is idempotent' do | ||
pdf = described_class.new | ||
|
||
contents = pdf.render | ||
contents2 = pdf.render | ||
expect(contents2).to eq(contents) | ||
|
@@ -508,18 +507,18 @@ def self.format(string) | |
end | ||
|
||
describe 'content stream characteristics' do | ||
it 'has 1 single content stream for a single page PDF' do | ||
it 'has 2 content streams for a single page PDF' do | ||
pdf = described_class.new | ||
pdf.text 'James' | ||
output = StringIO.new(pdf.render) | ||
hash = PDF::Reader::ObjectHash.new(output) | ||
|
||
streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) } | ||
|
||
expect(streams.size).to eq(1) | ||
expect(streams.size).to eq(2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shouldn't change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
end | ||
|
||
it 'has 1 single content stream for a single page PDF, even if go_to_page '\ | ||
it 'has 2 content streams for a single page PDF, even if go_to_page '\ | ||
'is used' do | ||
pdf = described_class.new | ||
pdf.text 'James' | ||
|
@@ -530,7 +529,7 @@ def self.format(string) | |
|
||
streams = hash.values.select { |obj| obj.is_a?(PDF::Reader::Stream) } | ||
|
||
expect(streams.size).to eq(1) | ||
expect(streams.size).to eq(2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shouldn't change either. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
end | ||
end | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
require 'spec_helper' | ||
require 'prawn/vera_pdf' | ||
|
||
include Prawn::VeraPdf | ||
|
||
if vera_pdf_available? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's nice to let developers know whats's wrong. But please make sure CI has all tools installed to actually run the specs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
require_relative 'pdfa_1b_spec_impl' | ||
else | ||
puts 'NOTICE: Specs for PDF/A-1b are not run, because veraPDF ' \ | ||
'binary was not found in path.' | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
require 'spec_helper' | ||
require 'prawn/vera_pdf' | ||
|
||
describe Prawn::Document do | ||
include Prawn::VeraPdf | ||
|
||
let(:pdf) { described_class.new(enable_pdfa_1b: true) } | ||
|
||
describe 'PDF/A 1b conformance' do | ||
it 'empty document' do | ||
expect(valid_pdfa_1b?(pdf.render)).to be true | ||
end | ||
|
||
it 'document with some text' do | ||
pdf.font_families.update( | ||
'DejaVuSans' => { | ||
normal: "#{Prawn::DATADIR}/fonts/DejaVuSans.ttf" | ||
} | ||
) | ||
pdf.font 'DejaVuSans' do | ||
pdf.text_box 'Some text', at: [100, 100] | ||
end | ||
expect(valid_pdfa_1b?(pdf.render)).to be true | ||
end | ||
|
||
it 'document with some image' do | ||
pdf.image "#{Prawn::DATADIR}/images/pigs.jpg" | ||
expect(valid_pdfa_1b?(pdf.render)).to be true | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -95,7 +95,7 @@ | |
next unless obj =~ %r{/Type /Page$} | ||
# The page object must contain the annotation reference | ||
# to render a clickable link | ||
expect(obj).to match(%r{^/Annots \[\d \d .\]$}) | ||
expect(obj).to match(%r{^/Annots \[\d+ \d .\]$}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this change is needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the additional object for the XMP metadata stream the object number for the annotation object switched from single digit to double digit (from 9 to 10). This regex only tested for single digit object number. If we make the XMP metadata stream optional, this change can be reverted. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reverted. |
||
end | ||
end | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A file opened with 'wb' does not allow reading. My seek-solution reads the rendered body to avoid a second render pass. As you already pointed pointed out the flaw with seeking, I will revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.