-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PDF/A-1b #34
base: master
Are you sure you want to change the base?
Conversation
.gitignore
Outdated
@@ -1,2 +1,3 @@ | |||
Gemfile.lock | |||
coverage/* | |||
/.idea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go into your global .gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/pdf/core/document_state.rb
Outdated
info: options[:info] | ||
} | ||
store_params[:print_scaling] = options[:print_scaling] if options[:print_scaling] | ||
store_params[:enable_pdfa_1b] = options[:enable_pdfa_1b] if options[:enable_pdfa_1b] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit repetitive. Maybe we can use options.select
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/pdf/core/xmp_metadata.rb
Outdated
render_xmp if @xmp_creator_tool || @xmp_create_date || @xmp_modify_date | ||
render_pdf if @pdf_keywords || @pdf_producer | ||
render_dc if @dc_title || @dc_creator || @dc_description | ||
@xml_doc.root.to_xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admire your perseverance but it feels like this whole class could be replaced by a string interpolation. It would be simpler, faster, and would spare us a dependency.
Speaking of which… Prawn is pure Ruby. All its deps are pure Ruby. It's one of our selling points. means that Nokogiri is not going to get added to Prawn deps. If you're convinced you must use it for this feature you should release it in its own gem instead. We'll be happy to to help you with integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have to use nokogiri. It was my first shot at the problem because I use it elsewhere. I accept that no external dependencies should be added and will look into changing it to string interpolations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/pdf/core/object_store.rb
Outdated
@@ -19,6 +19,18 @@ def initialize(opts = {}) | |||
|
|||
@info ||= ref(opts[:info] || {}).identifier | |||
@root ||= ref(Type: :Catalog).identifier | |||
|
|||
# PDF/A-1b requirement: XMP metadata | |||
@xmp_metadata ||= ref(Type: :Metadata, Subtype: :XML).identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this to be optional. The documents we generate now should not get bigger for no reason with new version of Prawn. Let's include this only in documents that actually have some metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The XMP metadata is now only added to PDF/A-1b documents.
lib/pdf/core/sRGB2014.icc.license
Outdated
@@ -0,0 +1,10 @@ | |||
ICC sRGB v2 Profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profile is not code. Please put it in the data
directory and please don't forget to add it to the packaged gem as well.
Also as far as I can tell copyright information and profile name is included in the profile itself. The license doesn't seem to require this file distribution along with the profile so maybe drop it. Include the links in commit message for history. That should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lib/pdf/core/renderer.rb
Outdated
@@ -216,10 +217,12 @@ def render_xref(output) | |||
# Write out the PDF Trailer, as per spec 3.4.4 | |||
# | |||
def render_trailer(output) | |||
trailer_id = PDF::Core::ByteString.new(SecureRandom.random_bytes(16)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make is deterministic. We're trying to reduce random changes to the generated documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 'deterministic' are you thinking of hashing some document content? Because the PDF spec (10.3 File Identifiers) suggest to use the current time beside some other data. But that would not solve the problem with the idempotent test you mentioned somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the time is mentioned. But it doesn't seem like a good indicator of content stability. After all, are two documents different if their content is identical? Also a bit earlier in that chapter one can find this:
The first byte string is a permanent identifier based on the contents of the file at the time it was originally created and does not change when the file is incrementally updated. The second byte string is a changing identifier based on the file’s contents at the time it was last updated. When a file is first written, both identifiers are set to the same value.
I would just hash the body. It would work great for our purpose. Also since Prawn doesn't really support incremental updates we can reuse the value for the second part of ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at getting both PDF/A-1b support in, and updating doc encryption to modern levels. In the process, I found that there's a potential issue here that will likely force a design decision for Prawn.
Specifically, the first trailer[:ID] value is used in calculating the encryption key for the PDF document. This is missing from the current Prawn implementation (which always assumes an empty :iD value), but is clearly laid out in the spec. So this ID value needs to be present when the encryption dictionary is built (currently when encrypt_document
is called) and invariant thereafter.
My current favored approach is:
- Calling
encrypt_document
should initialize the trailer[:iD] value in the exact same way as the PDF/A-1b code does, with the caveat that they may not wind up with the same value if the state of the document is altered between the two calls. Also, need to confirm that the current method for PDF/A-1b excludes an encryption dictionary from the ID calculation. - The PDF/A-1b code only sets trailer[:ID] if it isn't already set. Otherwise it uses the already present value
I think this is simple, involves the minimum change for Prawn and PDF-Core, and is consistent with the spec intentions.
Other options might include moving all encryption_dictionary calculation to the render phase or requiring that encrypt_document
be the last call to a document before rendering. On the face of it both of those seem less desirable so I haven't yet thought through the implications.
@pointlessone Thoughts?
ICC profile was downloaded from http://www.color.org/srgbprofiles.xalter#v2 ICC profile license at http://www.color.org/profiles2.xalter#license
@pointlessone what's your current thinking on this PR? Your concerns appear to have been addressed and it (combined with the matching @Backbone81 Not sure if you're engaged on this anymore (it's been a while) but if you have any thoughts they'd be appreciated. |
@petergoldstein It has been quite some time. I did need that feature for a project I was working on five years ago and finished my project with a prawnpdf fork containing the current state from those two merge requests. I have since then moved on to other projects and do not have access to the ISO specifications for PDF and PDF/A-1b any more. I would be fine with somebody else taking my changes and integrating them into the current state of prawnpdf. I think it would be a valuable feature for applications in the financial and insurance industry which do have a regulatory need for standards like PDF/A-1b. |
@Backbone81 Let me see if I can rebase this and then we can get @pointlessone 's feedback. Thanks for the initial contribution. |
We have been using this patch for a while now, with a few minor tweaks: EDIT: We used veraPDF for PDF/A-1b validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to check if I have a copy of PDF/A standard and if these changes represent the standard sufficiently.
xmp_metadata.stream = Stream.new | ||
xmp_metadata.stream << xmp_metadata_content.render |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xmp_metadata
is a reference. It's smart enough to create a stream when needed so this can be a bit simpler:
xmp_metadata.stream = Stream.new | |
xmp_metadata.stream << xmp_metadata_content.render | |
xmp_metadata << xmp_metadata_content.render |
icc_profile_name = 'sRGB2014.icc'.freeze | ||
|
||
icc_profile_stream = ref(N: 3) | ||
icc_profile_stream.stream = Stream.new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed.
This pull request complements prawnpdf/prawn#1029