Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: replace inode's Map by a list of pairs #2042

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

samoht
Copy link
Member

@samoht samoht commented Aug 9, 2022

New tentative for #1989

@samoht samoht added type/performance Rlated to performance no-changelog-needed No changelog is needed here labels Aug 9, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #2042 (ad273c5) into main (b5d5e3f) will decrease coverage by 0.01%.
The diff coverage is 63.82%.

@@            Coverage Diff             @@
##             main    #2042      +/-   ##
==========================================
- Coverage   64.48%   64.47%   -0.02%     
==========================================
  Files         130      130              
  Lines       15604    15650      +46     
==========================================
+ Hits        10063    10091      +28     
- Misses       5541     5559      +18     
Impacted Files Coverage Δ
src/irmin-pack/inode.ml 78.48% <0.00%> (-0.11%) ⬇️
src/irmin/import.ml 69.89% <65.21%> (-4.58%) ⬇️
src/irmin-fs/unix/irmin_fs_unix.ml 67.09% <0.00%> (-0.65%) ⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@zshipko
Copy link
Contributor

zshipko commented Aug 9, 2022

This seems to improve performance pretty significantly compared to main:

 |                          |     0     |       1
 | --                       | --        | --
 | -- main metrics --       |           | 
 | CPU time elapsed         |     0m47s |     0m39s  83%
 | Wall time elapsed        |     0m47s |     0m39s  83%
 | TZ-transactions per sec  |   253.723 |   304.650 120%
 | TZ-operations per sec    |  2632.493 |  3160.876 120%
 | Context.set per sec      | 19018.674 | 22836.024 120%
 | tail latency (1)         |   0.954 s |   0.723 s  76%
 |                          |           | 
 | -- resource usage --     |           | 
 | disk IO (total)          |           | 
 |   IOPS (op/sec)          |    43_802 |    52_314 119%
 |   throughput (bytes/sec) |   4.954 M |   5.937 M 120%
 |   total (bytes)          |   0.233 G |   0.233 G 100%
 | disk IO (read)           |           | 
 |   IOPS (op/sec)          |    42_364 |    50_587 119%
 |   throughput (bytes/sec) |   2.097 M |   2.505 M 119%
 |   total (bytes)          |   0.099 G |   0.098 G  99%
 | disk IO (write)          |           | 
 |   IOPS (op/sec)          |     1_438 |     1_727 120%
 |   throughput (bytes/sec) |   2.857 M |   3.432 M 120%
 |   total (bytes)          |   0.135 G |   0.135 G 100%
 |                          |           | 
 | max memory usage         |   0.561 G |   0.561 G 100%
 | mean CPU usage           |      100% |      100% 

Full comparison: stepmap.log

@metanivek
Copy link
Member

Nice! That looks promising! I'm running a lib_context benchmark for this PR to see how it performs in that context.

Copy link
Member

@metanivek metanivek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small comment but LGTM. Lets see how a bigger benchmark performs but the preliminary benchmark is definitely promising.

@@ -146,3 +146,106 @@ let shuffle state arr =
let len = Array.length arr in
aux (len - 1);
()

module Small_map = struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a small comment to document the intended usage of this module (eg only intended for "small" key sets and give a sense of what "small" is).

@metanivek
Copy link
Member

The results of running the lib_context benchmark is less conclusive, in particular I'm looking at the increased tail latency. I'll run another one to see if it is a consistent result.

 |                          |   base    |    stepmap
 | --                       | --        | --
 | -- main metrics --       |           |
 | CPU time elapsed         |   192m21s |   189m07s  98%
 | Wall time elapsed        |   231m55s |   228m33s  99%
 | TZ-transactions per sec  |  1370.961 |  1394.399 102%
 | TZ-operations per sec    |  2241.601 |  2279.923 102%
 | Context.add per sec      | 18272.123 | 18584.501 102%
 | tail latency (1)         |   0.640 s |   0.946 s 148%
 |                          |           |
 | -- resource usage --     |           |
 | disk IO (total)          |           |
 |   IOPS (op/sec)          |   246_585 |   250_802 102%
 |   throughput (bytes/sec) |  18.820 M |  19.142 M 102%
 |   total (bytes)          | 217.206 G | 217.207 G 100%
 | disk IO (read)           |           |
 |   IOPS (op/sec)          |   246_510 |   250_726 102%
 |   throughput (bytes/sec) |  13.870 M |  14.107 M 102%
 |   total (bytes)          | 160.071 G | 160.073 G 100%
 | disk IO (write)          |           |
 |   IOPS (op/sec)          |        76 |        77 102%
 |   throughput (bytes/sec) |   4.950 M |   5.035 M 102%
 |   total (bytes)          |  57.134 G |  57.135 G 100%
 |                          |           |
 | max memory usage (bytes) |   0.577 G |   0.565 G  98%
 | mean CPU usage           |       83% |       83%

@samoht
Copy link
Member Author

samoht commented Aug 11, 2022

The lib-context code might need a similar update to the one I've done for the schema changes (to stop sorting the list of children with a weird and useless ordering)

@metanivek
Copy link
Member

A second run (b) brought this branch more in line with main for tail latency (which is good to see!). I can also run another one with an updated schema to see what change that might have.

 |                          |   base    |       a        |       b
 | --                       | --        | --             | --
 | -- main metrics --       |           |                |
 | CPU time elapsed         |   192m46s |   189m07s  98% |   190m26s  99%
 | Wall time elapsed        |   232m39s |   228m33s  98% |   230m19s  99%
 | TZ-transactions per sec  |  1368.073 |  1394.399 102% |  1384.737 101%
 | TZ-operations per sec    |  2236.879 |  2279.923 102% |  2264.125 101%
 | Context.add per sec      | 18233.635 | 18584.501 102% | 18455.726 101%
 | tail latency (1)         |   0.695 s |   0.946 s 136% |   0.673 s  97%
 |                          |           |                |
 | -- resource usage --     |           |                |
 | disk IO (total)          |           |                |
 |   IOPS (op/sec)          |   246_068 |   250_802 102% |   249_068 101%
 |   throughput (bytes/sec) |  18.781 M |  19.142 M 102% |  19.010 M 101%
 |   total (bytes)          | 217.208 G | 217.207 G 100% | 217.210 G 100%
 | disk IO (read)           |           |                |
 |   IOPS (op/sec)          |   245_993 |   250_726 102% |   248_992 101%
 |   throughput (bytes/sec) |  13.840 M |  14.107 M 102% |  14.009 M 101%
 |   total (bytes)          | 160.073 G | 160.073 G 100% | 160.075 G 100%
 | disk IO (write)          |           |                |
 |   IOPS (op/sec)          |        75 |        77 102% |        76 101%
 |   throughput (bytes/sec) |   4.940 M |   5.035 M 102% |   5.000 M 101%
 |   total (bytes)          |  57.135 G |  57.135 G 100% |  57.135 G 100%
 |                          |           |                |
 | max memory usage (bytes) |   0.576 G |   0.565 G  98% |   0.584 G 101%
 | mean CPU usage           |       83% |       83%      |       83%

@metanivek
Copy link
Member

metanivek commented Aug 16, 2022

Here is one run of tezos patched with #2014 and this PR. Higher tail latency again, but results mostly seem inconclusive to me.
Full Summary

 |                          | main_220810 |    stepmap     | stepmap_tezos_order
 | --                       | --          | --             | --
 | -- main metrics --       |             |                |
 | CPU time elapsed         |     192m46s |   190m26s  99% |        191m09s  99%
 | Wall time elapsed        |     232m39s |   230m19s  99% |        231m01s  99%
 | TZ-transactions per sec  |    1368.073 |  1384.737 101% |       1379.620 101%
 | TZ-operations per sec    |    2236.879 |  2264.125 101% |       2255.758 101%
 | Context.add per sec      |   18233.635 | 18455.726 101% |      18387.522 101%
 | tail latency (1)         |     0.695 s |   0.673 s  97% |        0.853 s 123%
 |                          |             |                |
 | -- resource usage --     |             |                |
 | disk IO (total)          |             |                |
 |   IOPS (op/sec)          |     246_068 |   249_068 101% |        248_146 101%
 |   throughput (bytes/sec) |    18.781 M |  19.010 M 101% |       18.939 M 101%
 |   total (bytes)          |   217.208 G | 217.210 G 100% |      217.208 G 100%
 | disk IO (read)           |             |                |
 |   IOPS (op/sec)          |     245_993 |   248_992 101% |        248_070 101%
 |   throughput (bytes/sec) |    13.840 M |  14.009 M 101% |       13.957 M 101%
 |   total (bytes)          |   160.073 G | 160.075 G 100% |      160.074 G 100%
 | disk IO (write)          |             |                |
 |   IOPS (op/sec)          |          75 |        76 101% |             76 101%
 |   throughput (bytes/sec) |     4.940 M |   5.000 M 101% |        4.982 M 101%
 |   total (bytes)          |    57.135 G |  57.135 G 100% |       57.134 G 100%
 |                          |             |                |
 | max memory usage (bytes) |     0.576 G |   0.584 G 101% |        0.561 G  97%
 | mean CPU usage           |         83% |       83%      |            83%

@Ngoguey42
Copy link
Contributor

I'm surprised that this PR doesn't have a bigger effect on allocated words:

 |                                            |                     |      total       | avg per block 
 | gc.minor_words allocated                   | main_220810         |  2356.498 G      |  16.252 M     
 |                                            | stepmap             |  2334.767 G  99% |  16.102 M  99%
 |                                            | stepmap_tezos_order |  2328.504 G  99% |  16.059 M  99%
 | gc.major_words allocated                   | main_220810         |   205.634 G      |   1.418 M     
 |                                            | stepmap             |   201.136 G  98% |   1.387 M  98%
 |                                            | stepmap_tezos_order |   201.135 G  98% |   1.387 M  98%

Maybe Craig's Small_list on https://gitlab.com/tezos/tezos/-/merge_requests/3526 could help, but I'm not even sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog-needed No changelog is needed here type/performance Rlated to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants