You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the flamegraph it seems most of time (31.2% using Search) is spent in Prawn::Images::PNG#split_image_data and specifically in StringIO#read and String#byteslice called from there. The main cost is probably scanning the substring to compute the correct coderange (7-bit or valid for BINARY strings).
This should improve when TruffleRuby has adopted TruffleString (WIP).
It also indicates that in this case it might be better to compute the coderange lazily, since it's one of the rare cases where the coderange doesn't seem needed (for color.write data.read(color_bytes)), cc @djoooooe.
From a wider perspective it's an unfortunate case of Ruby not having a real "byte array" structure, and String being a relatively poor fit for it.
The text was updated successfully, but these errors were encountered:
I have taken another look at this.
To repro, clone prawn, bundle and then bundle exec rspec spec/prawn/images/png_spec.rb.
To profile: TRUFFLERUBYOPT=--cpusampler=flamegraph bundle exec rspec spec/prawn/images/png_spec.rb
The whole test suite takes 80.61s locally which seems reasonable with truffleruby-dev.
That specific spec takes 37.44 seconds, so it's not solved yet.
Taking a look at the profile, byteslice is no longer part of it but now there is these TruffleRuby.synchronized calls which were added to make StringIO thread-safe and might be some overhead.
I also tried on TruffleRuby native 22.3 which doesn't have the TruffleRuby.synchronized but has TruffleString and that runs the spec in 33.62 seconds, so the synchronization doesn't seem the main overhead here.
CRuby 3.1 takes <1 second for that spec.
I think the perf problem is due to that code usingStringIO#putc and that in TruffleRuby (for master at least) just delegates to write which then goes into the pos < str.bytesize case and uses Primitive.string_splice and that creates two lazy Concat strings which is very expensive due to lots of allocations for single bytes like that.
I think what we want is to use setbyte or equivalent logic for such a case, so go to the MutableTruffleString and then can be efficient and avoid allocations.
There are also StringIO#write calls though in that method so we also need to handle that better, i.e., Primitive.string_splice needs to avoid doing lazy Concat strings when it gets a MutableTruffleString and writes a string of N bytes with byte_count_to_replace=N.
It seems best to fix Primitive.string_splice first as that should also cover putc without needing special logic in putc.
From prawnpdf/prawn#1246 (comment):
From the flamegraph it seems most of time (31.2% using
Search
) is spent inPrawn::Images::PNG#split_image_data
and specifically inStringIO#read
andString#byteslice
called from there. The main cost is probably scanning the substring to compute the correct coderange (7-bit or valid for BINARY strings).This should improve when TruffleRuby has adopted TruffleString (WIP).
It also indicates that in this case it might be better to compute the coderange lazily, since it's one of the rare cases where the coderange doesn't seem needed (for
color.write data.read(color_bytes)
), cc @djoooooe.From a wider perspective it's an unfortunate case of Ruby not having a real "byte array" structure, and String being a relatively poor fit for it.
The text was updated successfully, but these errors were encountered: