Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment: only sample compact words and use better 64-bit hash in zhenya test #108

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

crusso
Copy link

@crusso crusso commented Feb 16, 2024

No description provided.

Copy link

github-actions bot commented Feb 16, 2024

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 190_578 ($\textcolor{red}{0.51\%}$) 8_677_435_631 ($\textcolor{red}{3.66\%}$) 59_550_664 ($\textcolor{green}{-3.93\%}$) 334_166 ($\textcolor{green}{-3.12\%}$) 372_166 ($\textcolor{green}{-99.99\%}$) 361_464 ($\textcolor{green}{-2.63\%}$) 11_094_880_659 ($\textcolor{red}{0.62\%}$)
triemap 196_095 ($\textcolor{red}{0.30\%}$) 13_905_101_144 ($\textcolor{red}{0.36\%}$) 71_717_452 ($\textcolor{green}{-3.37\%}$) 249_369 ($\textcolor{green}{-2.05\%}$) 642_587 ($\textcolor{green}{-2.85\%}$) 645_747 ($\textcolor{green}{-0.78\%}$) 15_695_504_052 ($\textcolor{green}{-0.77\%}$)
rbtree 186_336 ($\textcolor{green}{-0.12\%}$) 7_037_815_635 ($\textcolor{green}{-1.26\%}$) 51_814_928 ($\textcolor{green}{-10.66\%}$) 118_641 ($\textcolor{red}{3.80\%}$) 318_954 ($\textcolor{red}{0.19\%}$) 332_694 ($\textcolor{red}{1.35\%}$) 6_872_277_147 ($\textcolor{green}{-4.14\%}$)
splay 191_049 ($\textcolor{red}{0.28\%}$) 13_185_262_200 ($\textcolor{green}{-0.47\%}$) 47_829_136 ($\textcolor{green}{-11.42\%}$) 621_734 ($\textcolor{green}{-1.10\%}$) 677_921 ($\textcolor{red}{2.47\%}$) 937_345 ($\textcolor{red}{1.67\%}$) 4_333_406_943 ($\textcolor{green}{-5.13\%}$)
btree 230_967 ($\textcolor{red}{0.48\%}$) 10_843_807_996 ($\textcolor{red}{5.63\%}$) 25_021_016 ($\textcolor{green}{-19.56\%}$) 379_413 ($\textcolor{red}{7.29\%}$) 492_873 ($\textcolor{red}{2.23\%}$) 572_201 ($\textcolor{red}{7.17\%}$) 2_856_105_607 ($\textcolor{green}{-8.87\%}$)
zhenya_hashmap 188_742 ($\textcolor{green}{-0.29\%}$) 1_710_509_220 ($\textcolor{green}{-33.46\%}$) 16_777_504 ($\textcolor{green}{-26.33\%}$) 50_761 ($\textcolor{green}{-15.67\%}$) 73_525 ($\textcolor{red}{4.83\%}$) 112_957 ($\textcolor{red}{37.00\%}$) 5_610_596_903 ($\textcolor{red}{69.73\%}$)
btreemap_rs 537_393 1_793_333_047 27_590_656 75_328 125_166 86_260 2_937_041_107
imrc_hashmap_rs 542_882 2_584_501_850 244_973_568 37_762 178_926 115_385 5_796_587_958
hashmap_rs 529_458 439_248_112 73_138_176 21_501 26_711 25_024 1_298_646_667

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50.1 upgrade
heap 167_286 ($\textcolor{green}{-0.13\%}$) 5_702_353_152 ($\textcolor{red}{0.08\%}$) 24_000_360 ($\textcolor{green}{-19.99\%}$) 635_926 ($\textcolor{red}{2.35\%}$) 234_265 ($\textcolor{red}{2.44\%}$) 606_194 ($\textcolor{red}{2.36\%}$) 3_193_825_390 ($\textcolor{green}{-3.50\%}$)
heap_rs 525_853 139_669_830 18_284_544 57_419 23_051 57_545 510_960_192

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 174_288 ($\textcolor{red}{0.22\%}$) 2_799_267 ($\textcolor{red}{8.88\%}$) 65_644 100_512 ($\textcolor{red}{5.26\%}$) 866_454 ($\textcolor{red}{8.25\%}$) 187_512 ($\textcolor{red}{9.98\%}$) 3_141_338 ($\textcolor{red}{2.61\%}$)
vector 172_193 ($\textcolor{green}{-0.03\%}$) 2_048_943 ($\textcolor{red}{6.66\%}$) 24_580 133_138 ($\textcolor{red}{5.57\%}$) 195_575 ($\textcolor{red}{6.65\%}$) 184_331 ($\textcolor{red}{4.74\%}$) 4_685_196 ($\textcolor{green}{-0.22\%}$)
vec_rs 520_881 289_040 1_376_256 17_251 30_571 23_331 3_161_017

Stable structures

Note
Same as main branch, skipping.

Statistics

  • binary_size: 0.14% [-0.04%, 0.31%]
  • max_mem: -13.61% [-19.97%, -7.25%]
  • cycles: 0.62% [-4.50%, 5.74%]

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 195_513 ($\textcolor{green}{-0.50\%}$) 281_134_819 ($\textcolor{red}{2.97\%}$) 261_270_435 ($\textcolor{red}{0.55\%}$) 35_174 ($\textcolor{red}{2.33\%}$) 25_432 ($\textcolor{red}{2.15\%}$)
Rust 537_397 82_787_911 56_792_991 47_914 50_388

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 245_076 ($\textcolor{green}{-0.04\%}$) 4_883_900_392 ($\textcolor{red}{2.53\%}$) 3_430_044 579_659 ($\textcolor{red}{2.57\%}$) 432_230 ($\textcolor{red}{7.50\%}$) 274_734_742 ($\textcolor{red}{0.17\%}$)
Rust 565_792 6_409_147_805 2_228_224 1_019_959 303_897 6_019_483_730

Statistics

  • binary_size: -0.27% [-1.71%, 1.16%]
  • max_mem: no change
  • cycles: 2.60% [1.11%, 4.08%]

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 274_439 ($\textcolor{green}{-0.37\%}$) 510_973 ($\textcolor{red}{0.06\%}$) 22_488 ($\textcolor{red}{1.02\%}$) 18_681 ($\textcolor{red}{0.37\%}$) 19_761 ($\textcolor{red}{0.64\%}$) 157_965 ($\textcolor{red}{0.23\%}$)
Rust 849_921 599_916 ($\textcolor{green}{-0.01\%}$) 99_156 123_702 136_655 1_799_828 ($\textcolor{green}{-0.00\%}$)

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 220_659 ($\textcolor{green}{-0.88\%}$) 481_160 ($\textcolor{red}{0.00\%}$) 30_218 ($\textcolor{red}{1.37\%}$) 8_776 89_843 ($\textcolor{red}{0.43\%}$)
Rust 869_104 236_542 368_044 91_941 1_999_207 ($\textcolor{green}{-0.00\%}$)

Statistics

  • binary_size: -0.62% [-2.22%, 0.97%]
  • max_mem: no change
  • cycles: 0.37% [0.12%, 0.63%]

Heartbeat

binary_size heartbeat
Motoko 137_356 ($\textcolor{green}{-0.39\%}$) 19_515 ($\textcolor{red}{0.02\%}$)
Rust 23_637 480 ($\textcolor{green}{-56.83\%}$)

Timer

binary_size setTimer cancelTimer
Motoko 145_843 ($\textcolor{green}{-0.31\%}$) 53_003 ($\textcolor{red}{2.61\%}$) 4_628 ($\textcolor{red}{0.39\%}$)
Rust 487_585 68_173 11_184

Statistics

  • binary_size: -0.31%
  • max_mem: no change
  • cycles: 1.50% [-5.51%, 8.51%]

Garbage Collection

generate 700k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_146_951_684 ($\textcolor{green}{-2.09\%}$) 50_288_680 ($\textcolor{green}{-3.27\%}$) 119 119 119
copying 1_146_951_566 ($\textcolor{green}{-2.09\%}$) 50_288_680 ($\textcolor{green}{-3.27\%}$) 1_146_669_222 ($\textcolor{green}{-2.09\%}$) 1_146_757_219 ($\textcolor{green}{-2.09\%}$) 1_146_671_127 ($\textcolor{green}{-2.09\%}$)
compacting 1_643_059_256 ($\textcolor{green}{-1.74\%}$) 50_288_680 ($\textcolor{green}{-3.27\%}$) 1_268_924_251 ($\textcolor{green}{-1.64\%}$) 1_509_402_797 ($\textcolor{green}{-1.57\%}$) 1_537_068_318 ($\textcolor{green}{-1.75\%}$)
generational 2_460_323_534 ($\textcolor{green}{-2.72\%}$) 50_297_144 ($\textcolor{green}{-3.27\%}$) 952_674_141 ($\textcolor{green}{-4.69\%}$) 1_247_801 ($\textcolor{red}{1.22\%}$) 1_112_579 ($\textcolor{red}{0.79\%}$)
incremental 29_503_170 979_645_844 ($\textcolor{green}{-0.63\%}$) 478_912_995 ($\textcolor{red}{0.20\%}$) 494_653_376 ($\textcolor{red}{0.21\%}$) 1_149_877_816 ($\textcolor{red}{5.27\%}$)

Actor class

binary size put new bucket put existing bucket get
Map 299_783 ($\textcolor{green}{-0.13\%}$) 815_147 ($\textcolor{green}{-0.11\%}$) 16_125 ($\textcolor{red}{0.16\%}$) 16_670 ($\textcolor{red}{0.16\%}$)

Statistics

  • binary_size: no change
  • max_mem: -2.75% [-3.87%, -1.62%]
  • cycles: -0.84% [-1.62%, -0.05%]

Publisher & Subscriber

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 161_484 ($\textcolor{green}{-0.28\%}$) 145_919 ($\textcolor{green}{-0.21\%}$) 28_597 ($\textcolor{red}{0.01\%}$) 11_963 22_870 ($\textcolor{red}{0.03\%}$) 6_454 ($\textcolor{red}{0.37\%}$)
Rust 519_866 570_028 68_903 42_634 92_131 51_818

Statistics

  • binary_size: -0.24% [-0.46%, -0.03%]
  • max_mem: no change
  • cycles: 0.14% [-0.21%, 0.48%]

Overall Statistics

  • binary_size: -0.09% [-0.25%, 0.08%]
  • max_mem: -9.08% [-13.50%, -4.66%]
  • cycles: 0.44% [-2.12%, 3.00%]

Copy link

github-actions bot commented Feb 16, 2024

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.
  • upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade takes no cycles, as the state is already in the stable memory.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
  • The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
  • Different library has different ways for persisting data during upgrades, there are mainly three categories:
    • Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
    • Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
    • Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
  • The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • btree comes from mops.one/stableheapbtreemap.
  • zhenya_hashmap comes from mops.one/map.
  • vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

binary_size generate 1m max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
hashmap 190_578 8_677_435_631 59_550_664 334_166 372_166 361_464 11_094_880_659
triemap 196_095 13_905_101_144 71_717_452 249_369 642_587 645_747 15_695_504_052
rbtree 186_336 7_037_815_635 51_814_928 118_641 318_954 332_694 6_872_277_147
splay 191_049 13_185_262_200 47_829_136 621_734 677_921 937_345 4_333_406_943
btree 230_967 10_843_807_996 25_021_016 379_413 492_873 572_201 2_856_105_607
zhenya_hashmap 188_742 1_710_509_220 16_777_504 50_761 73_525 112_957 5_610_596_903
btreemap_rs 537_393 1_793_333_047 27_590_656 75_328 125_166 86_260 2_937_041_107
imrc_hashmap_rs 542_882 2_584_501_850 244_973_568 37_762 178_926 115_385 5_796_587_958
hashmap_rs 529_458 439_248_112 73_138_176 21_501 26_711 25_024 1_298_646_667

Priority queue

binary_size heapify 1m max mem pop_min 50 put 50 pop_min 50 upgrade
heap 167_286 5_702_353_152 24_000_360 635_926 234_265 606_194 3_193_825_390
heap_rs 525_853 139_669_830 18_284_544 57_419 23_051 57_545 510_960_192

Growable array

binary_size generate 5k max mem batch_get 500 batch_put 500 batch_remove 500 upgrade
buffer 174_288 2_799_267 65_644 100_512 866_454 187_512 3_141_338
vector 172_193 2_048_943 24_580 133_138 195_575 184_331 4_685_196
vec_rs 520_881 289_040 1_376_256 17_251 30_571 23_331 3_161_017

Stable structures

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50 upgrade
btreemap_rs 537_393 76_200_333 2_555_904 64_886 97_044 85_272 126_265_270
btreemap_stable_rs 543_609 4_561_985_735 2_031_616 2_707_064 5_026_642 8_594_683 729_311
heap_rs 525_853 7_051_730 2_293_760 49_928 23_299 49_894 26_768_703
heap_stable_rs 506_559 271_553_517 458_752 2_294_851 238_596 2_277_771 729_317
vec_rs 520_881 3_079_382 2_293_760 17_251 18_421 17_719 24_671_551
vec_stable_rs 503_829 63_394_912 458_752 62_491 79_685 81_633 729_320

Environment

  • dfx 0.16.1
  • Motoko compiler 0.10.4 (source 3cdgp4f5-bswfknv3-l2mlljs3-4nag69rn)
  • rustc 1.75.0 (82e1608df 2023-12-21)
  • ic-repl 0.6.2
  • ic-wasm 0.7.0

Cryptographic libraries

Measure different cryptographic libraries written in both Motoko and Rust.

  • SHA-2 benchmarks
    • SHA-256/SHA-512. Compute the hash of a 1M Wasm binary.
    • account_id. Compute the ledger account id from principal, based on SHA-224.
    • neuron_id. Compute the NNS neuron id from principal, based on SHA-256.
  • Certified map. Merkle Tree for storing key-value pairs and generate witness according to the IC Interface Specification.
    • generate 10k. Insert 10k 7-character word as both key and value into the certified map.
    • max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
    • inc. Increment a counter and insert the counter value into the map.
    • witness. Generate the root hash and a witness for the counter.
    • upgrade. Upgrade the canister with the same Wasm. In Motoko, we use stable variable. In Rust, we convert the tree to a vector before serialization.

SHA-2

binary_size SHA-256 SHA-512 account_id neuron_id
Motoko 195_513 281_134_819 261_270_435 35_174 25_432
Rust 537_397 82_787_911 56_792_991 47_914 50_388

Certified map

binary_size generate 10k max mem inc witness upgrade
Motoko 245_076 4_883_900_392 3_430_044 579_659 432_230 274_734_742
Rust 565_792 6_409_147_805 2_228_224 1_019_959 303_897 6_019_483_730

Environment

  • dfx 0.16.1
  • Motoko compiler 0.10.4 (source 3cdgp4f5-bswfknv3-l2mlljs3-4nag69rn)
  • rustc 1.75.0 (82e1608df 2023-12-21)
  • ic-repl 0.6.2
  • ic-wasm 0.7.0

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal upgrade
Motoko 274_439 510_973 22_488 18_681 19_761 157_965
Rust 849_921 599_916 99_156 123_702 136_655 1_799_828

DIP721 NFT

binary_size init mint_token transfer_token upgrade
Motoko 220_659 481_160 30_218 8_776 89_843
Rust 869_104 236_542 368_044 91_941 1_999_207

Environment

  • dfx 0.16.1
  • Motoko compiler 0.10.4 (source 3cdgp4f5-bswfknv3-l2mlljs3-4nag69rn)
  • rustc 1.75.0 (82e1608df 2023-12-21)
  • ic-repl 0.6.2
  • ic-wasm 0.7.0

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 137_356 19_515
Rust 23_637 480

Timer

binary_size setTimer cancelTimer
Motoko 145_843 53_003 4_628
Rust 487_585 68_173 11_184

Environment

  • dfx 0.16.1
  • Motoko compiler 0.10.4 (source 3cdgp4f5-bswfknv3-l2mlljs3-4nag69rn)
  • rustc 1.75.0 (82e1608df 2023-12-21)
  • ic-repl 0.6.2
  • ic-wasm 0.7.0

Motoko Specific Benchmarks

Measure various features only available in Motoko.

  • Garbage Collection. Measure Motoko garbage collection cost using the Triemap benchmark. The max mem column reports rts_max_heap_size after generate call. The cycle cost numbers reported here are garbage collection cost only. Some flamegraphs are truncated due to the 2M log size limit. The dfx/ic-wasm optimizer is disabled for the garbage collection test cases due to how the optimizer affects function names, making profiling trickier.

    • default. Compile with the default GC option. With the current GC scheduler, generate will trigger the copying GC. The rest of the methods will not trigger GC.
    • copying. Compile with --force-gc --copying-gc.
    • compacting. Compile with --force-gc --compacting-gc.
    • generational. Compile with --force-gc --generational-gc.
    • incremental. Compile with --force-gc --incremental-gc.
  • Actor class. Measure the cost of spawning actor class, using the Actor classes example.

Garbage Collection

generate 700k max mem batch_get 50 batch_put 50 batch_remove 50
default 1_146_951_684 50_288_680 119 119 119
copying 1_146_951_566 50_288_680 1_146_669_222 1_146_757_219 1_146_671_127
compacting 1_643_059_256 50_288_680 1_268_924_251 1_509_402_797 1_537_068_318
generational 2_460_323_534 50_297_144 952_674_141 1_247_801 1_112_579
incremental 29_503_170 979_645_844 478_912_995 494_653_376 1_149_877_816

Actor class

binary size put new bucket put existing bucket get
Map 299_783 815_147 16_125 16_670

Environment

  • dfx 0.16.1
  • Motoko compiler 0.10.4 (source 3cdgp4f5-bswfknv3-l2mlljs3-4nag69rn)
  • rustc 1.75.0 (82e1608df 2023-12-21)
  • ic-repl 0.6.2
  • ic-wasm 0.7.0

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

pub_binary_size sub_binary_size subscribe_caller subscribe_callee publish_caller publish_callee
Motoko 161_484 145_919 28_597 11_963 22_870 6_454
Rust 519_866 570_028 68_903 42_634 92_131 51_818

Environment

  • dfx 0.16.1
  • Motoko compiler 0.10.4 (source 3cdgp4f5-bswfknv3-l2mlljs3-4nag69rn)
  • rustc 1.75.0 (82e1608df 2023-12-21)
  • ic-repl 0.6.2
  • ic-wasm 0.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants