You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.
Collection libraries
Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.
We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:
generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.
upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade takes no cycles, as the state is already in the stable memory.
💎 Takeaways
The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.
Note
The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.
The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.
Different library has different ways for persisting data during upgrades, there are mainly three categories:
Use stable variable directly in Motoko: zhenya_hashmap, btree, vector
Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs
Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs
The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.
hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.
hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.
The cost difference is mainly due to the Candid serialization cost.
Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.
Measure the cost of empty heartbeat and timer job.
setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test perf for dfinity/candid#542