Array natives #1

fbogsany · 2014-10-08T19:51:09Z

This adds Array-specific implementations of some Enumerable methods. In some cases (see notes below), two methods are synonymous, but Array only overrode one of them. I focussed on Enumerable methods that we actually use in Shopify. There are some more esoteric methods like chunk, minmax, slice_before and slice_after that I ignored.

I've implemented this against a recent MRI trunk. It should be a trivial backport to https://github.com/Shopify/ruby.

r: @camilo @csfrancis @jasonhl
cc: @grollest @sirupsen

Without change:

                       user     system      total        real
all? {}            0.660000   0.000000   0.660000 (  0.663685)
all?               0.150000   0.000000   0.150000 (  0.150084)
include?           0.330000   0.010000   0.340000 (  0.340451)
member?            0.510000   0.000000   0.510000 (  0.508890)
one? {}            0.630000   0.000000   0.630000 (  0.632244)
one?               0.150000   0.010000   0.160000 (  0.156780)
none?              0.150000   0.000000   0.150000 (  0.156365)
none? {}           0.740000   0.000000   0.740000 (  0.744607)
detect             0.560000   0.000000   0.560000 (  0.563873)
find               0.550000   0.000000   0.550000 (  0.561899)
each_with_index    1.010000   0.000000   1.010000 (  1.009375)
each_with_object   0.590000   0.010000   0.600000 (  0.588555)
partition          0.920000   0.040000   0.960000 (  0.966469)
grep               1.950000   0.000000   1.950000 (  1.943385)
grep {}            1.890000   0.000000   1.890000 (  1.898237)
flat_map           1.470000   0.070000   1.540000 (  1.539217)
collect_concat     1.530000   0.040000   1.570000 (  1.571326)
to_a               0.000000   0.000000   0.000000 (  0.000008)
entries            0.300000   0.000000   0.300000 (  0.306828)
select             0.560000   0.040000   0.600000 (  0.611437)
find_all           0.730000   0.000000   0.730000 (  0.742103)

With change:

                       user     system      total        real  improvement
all? {}            0.430000   0.000000   0.430000 (  0.434470)      1.53
all?               0.010000   0.000000   0.010000 (  0.007103)     21.13
include?           0.330000   0.010000   0.340000 (  0.339223)      1.00
member?            0.340000   0.000000   0.340000 (  0.340663)      1.49
one? {}            0.420000   0.010000   0.430000 (  0.425346)      1.49
one?               0.010000   0.000000   0.010000 (  0.010269)     15.27
none?              0.010000   0.000000   0.010000 (  0.010471)     14.93
none? {}           0.470000   0.000000   0.470000 (  0.472409)      1.58
detect             0.400000   0.010000   0.410000 (  0.413766)      1.36
find               0.440000   0.000000   0.440000 (  0.445540)      1.26
each_with_index    0.800000   0.000000   0.800000 (  0.802467)      1.26
each_with_object   0.420000   0.010000   0.430000 (  0.426938)      1.38
partition          0.590000   0.040000   0.630000 (  0.634688)      1.52
grep               1.600000   0.010000   1.610000 (  1.612733)      1.21
grep {}            1.690000   0.010000   1.700000 (  1.701325)      1.12
flat_map           1.380000   0.080000   1.460000 (  1.461663)      1.05
collect_concat     1.560000   0.060000   1.620000 (  1.611663)      0.97
to_a               0.000000   0.000000   0.000000 (  0.000010)      0.80
entries            0.000000   0.000000   0.000000 (  0.000010)  30682.80
select             0.580000   0.000000   0.580000 (  0.574380)      1.06
find_all           0.560000   0.010000   0.570000 (  0.566607)      1.31

Notes:

include? and member? are synonyms, but member? was calling the Enumerable version of the method. The Array version is about 50% faster.
flat_map and collect_concat are synonyms. There doesn't seem to be much benefit to a custom Array implementation.
to_a and entries are synonyms, but entries was calling the Enumerable version of the method. The Array version is a little faster.
select and find_all are synonyms, but find_all was calling the Enumerable version of the method. The Array version is about 30% faster.
my implementation of inject/reduce is buggy, so I've disabled it for now. Given the complexity of the code, I'm not convinced there's a huge win here, other than removing a memo object allocation.
some of these methods remove a memo object allocation - I'll add a few benchmarks with empty arrays to measure the benefit of removing those allocations.
the grep benchmark is a bit shady because the block is only called for matching elements, and the content of array is random.

Benchmarks:

require 'benchmark'

array = (1..9999999).map { rand }
array << 1.0

true_array = (1..10000000).map { true }
false_array = (1..10000000).map { false }
mostly_false_array = false_array.dup
mostly_false_array[mostly_false_array.length-1] = true

Benchmark.bmbm do |x|
  x.report("all? {}") { array.dup.all? {|e| e >= 0.0 } }
  x.report("all?") { true_array.dup.all? }
  x.report("include?") { array.dup.include?(1.0) }
  x.report("member?")  { array.dup.member?(1.0) }
  x.report("one? {}") { array.dup.one? {|e| e == 1.0 } }
  x.report("one?") { mostly_false_array.dup.one? }
  x.report("none?") { false_array.dup.none? }
  x.report("none? {}") { array.dup.none? {|e| e < 0.0 } }
  x.report("detect") { array.dup.detect {|e| e == 1.0 } }
  x.report("find") { array.dup.find {|e| e == 1.0 } }
  x.report("each_with_index") { s = 0; array.dup.each_with_index {|e, i| s += e+i }}
  x.report("each_with_object") { array.dup.each_with_object(0) {|e, o| }}
  x.report("partition") { array.dup.partition {|e| e < 0.5 }}
  x.report("grep") { array.dup.grep 0.4...0.5 }
  x.report("grep {}") { s = 0; array.dup.grep(0.4...0.5) {|e| s += e } }
  x.report("flat_map") { array.dup.flat_map {|e| [e, -e] } }
  x.report("collect_concat") { array.dup.collect_concat {|e| [e, -e] } }
  x.report("to_a") { array.dup.to_a }
  x.report("entries") { array.dup.entries }
  x.report("select") { array.dup.select {|e| e < 0.5 } }
  x.report("find_all") { array.dup.find_all {|e| e < 0.5 } }
end

sirupsen · 2014-10-08T20:18:56Z

Holy shit. This is really cool. Going to see if I can find time to look at this tonight.

csfrancis · 2014-10-10T00:55:35Z

array.c

+rb_ary_each_with_index(int argc, VALUE *argv, VALUE array)
+{
+    long i;
+    volatile VALUE ary = array;


Why does this need to be volatile?

Apparently because clang: be953b4

It's an odd thing to do, and I don't know if there's any science behind it or if it was just some ✨magic✨ at that point in time.

csfrancis · 2014-10-10T01:34:37Z

Only a couple of small comments, but overall this looks really great!

jasonhl · 2014-10-10T13:43:22Z

Would it make sense to include a benchmark/test using an array with mixed types?

[1, 'abc', {"key" => "value"}, :symbol]

I have no idea if that's going to matter -- just wonder if it'll root out some weird corner cases.

fbogsany · 2014-10-10T13:53:56Z

Would it make sense to include a benchmark/test using an array with mixed types?

It won't matter. Most of the array methods don't care about the content of the array, except for some that check for truthiness. This PR passes the MRI test suite as well as trunk does - i.e. with the same random GC test failures.

That reminds me that I should run the included benchmarks, though, in addition to my micro-benchmarks.

jasonhl · 2014-10-10T16:34:30Z

That reminds me that I should run the included benchmarks, though, in addition to my micro-benchmarks.

Can you post the results when you do run those?

jasonhl · 2014-10-10T19:44:12Z

It's been so long since I had a look at any serious C that I don't think my review counts for much, but I didn't see anything objectionable.

Awesome work. Hope MRI takes it.

👍

fbogsany · 2014-10-23T21:25:45Z

array.c

+{
+    ID id;
+    VALUE op, init = Qnil;
+    volatile VALUE ary = array;


So this local copy of the receiver turned out to be important. Thanks to @grollest for suggesting this fix. There's still a bug lurking in here, but it shows up in the tests rather than some weird failure to build an extension:

#191 test_flow.rb:19:in `<top (required)>': ["a"].inject("ng"){|x,y| break :ok } #=> "ng" (expected "ok") #193 test_flow.rb:37:in `<top (required)>': ["a"].inject("ng"){|x,y|; $a << 2 break :ok; $a << 3 }; $a << 4 ; $a << 5 ; rescue Exception; $a << 99; end; $a #=> "[1, 4, 5]" (expected "[1, 2, 4, 5]") test_flow.rb FAIL 2/61

This is fixed now.

fbogsany · 2014-10-23T23:52:33Z

I've addressed review comments & fixed the bugs in inject. I'll update with benchmark results. If anyone cares to cast 👀 on this, that'd be 🆒 .

camilo · 2014-10-23T23:54:48Z

@fbogsany iamma take a look but we should talk to @csfrancis so we can build it and try IRL

fbogsany · 2014-10-23T23:58:57Z

👍 on trying IRL.

Array natives

63f4b6a

csfrancis reviewed Oct 10, 2014
View reviewed changes

Volatile local copy of array

f2a2f9e

fbogsany reviewed Oct 23, 2014
View reviewed changes

fbogsany added 2 commits October 23, 2014 22:35

Fix off-by-one error

52af310

Address review comments

6579a7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array natives #1

Array natives #1

fbogsany commented Oct 8, 2014

sirupsen commented Oct 8, 2014

csfrancis Oct 10, 2014

fbogsany Oct 10, 2014

csfrancis commented Oct 10, 2014

jasonhl commented Oct 10, 2014

fbogsany commented Oct 10, 2014

jasonhl commented Oct 10, 2014

jasonhl commented Oct 10, 2014

fbogsany Oct 23, 2014

fbogsany Oct 23, 2014

fbogsany commented Oct 23, 2014

camilo commented Oct 23, 2014

fbogsany commented Oct 23, 2014

Array natives #1

Are you sure you want to change the base?

Array natives #1

Conversation

fbogsany commented Oct 8, 2014

Without change:

With change:

Notes:

Benchmarks:

sirupsen commented Oct 8, 2014

csfrancis Oct 10, 2014

Choose a reason for hiding this comment

fbogsany Oct 10, 2014

Choose a reason for hiding this comment

csfrancis commented Oct 10, 2014

jasonhl commented Oct 10, 2014

fbogsany commented Oct 10, 2014

jasonhl commented Oct 10, 2014

jasonhl commented Oct 10, 2014

fbogsany Oct 23, 2014

Choose a reason for hiding this comment

fbogsany Oct 23, 2014

Choose a reason for hiding this comment

fbogsany commented Oct 23, 2014

camilo commented Oct 23, 2014

fbogsany commented Oct 23, 2014