Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move][stdlib] Add efficient BigOrderedMap implementation #14753

Open
wants to merge 16 commits into
base: igor/ordered_map
Choose a base branch
from

Conversation

igor-aptos
Copy link
Contributor

Description

Having efficient and concurrent large-sized "Map" and "OrderedMap" implementations is useful across variety of needs.

Current SmartTable implementation has various limitations, from it being sequential, to it not having smart ways to deal with collisions. It cannot be improved - as the structs are unmodifiable, and so should be deprecated fully.

In this PR:

  • provide efficient "OrderedMap" implementation. BTreeMap is chosen as it is best suited for the onchain storage layout - where majority of cost comes from loading and writing to storage items, and there is no partial read/write of them. It also rebalances in a way that reduces amount writes needed
  • writing to keys that are not close enough, to a BTreeMap with multiple layers, is generally parallel. More elements in the BTreeMap, the more operations will be parallel.
  • is an enum, so can be evolved in the future as needed
  • has an option to pre-pay and pre-allocate storage slots, and to reuse them, achieving predictable gas charges.
    • defined potentially generally useful SlotsStorage, to manage indexed storage slots for datastructures that need it. Easier to use and more flexible/configurable than directly using Table. (which is also an enum, and so can be evolved / new strategies can be added)

How Has This Been Tested?

provided extensive unit tests. will probably consolidate and remove some before committing (to not make CI slower/more expensive), but for development those were useful.

Key Areas to Review

  • Original implementation is @grao1991 's (and that is the first commit in the stack), on top of it:
  • inline values in the leaf nodes. make max degree of inner nodes separately configurable from max degree of leaf nodes.
  • only modify necessary nodes, to reduce costs and achieve parallelism.
  • made keys generic. removed drop+copy requirement on the values
  • fixed indexing of nodes, and extracted (potentially generally useful SlotsStorage), allowing preallocating/reusing storage slots.

Type of Change

  • New feature

Which Components or Systems Does This Change Impact?

  • Aptos Framework

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Sep 25, 2024

⏱️ 1h 25m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
rust-move-unit-coverage 14m 🟩
rust-move-unit-coverage 12m 🟩
rust-move-unit-coverage 10m 🟩
rust-move-unit-coverage 10m 🟩
rust-cargo-deny 7m 🟩🟩🟩🟩
general-lints 7m 🟩🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥
semgrep/ci 2m 🟩🟩🟩🟩
file_change_determinator 45s 🟩🟩🟩🟩
file_change_determinator 40s 🟩🟩🟩🟩
permission-check 11s 🟩🟩🟩🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Sep 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.4%. Comparing base (76e0b76) to head (e5ee36d).

Additional details and impacted files
@@                Coverage Diff                @@
##           igor/ordered_map   #14753   +/-   ##
=================================================
  Coverage              57.4%    57.4%           
=================================================
  Files                   859      859           
  Lines                211663   211663           
=================================================
  Hits                 121527   121527           
  Misses                90136    90136           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msmouse
Copy link
Contributor

msmouse commented Sep 26, 2024

The world with Enums is beautiful 🚀

Copy link
Contributor

@lightmark lightmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't fully review the tree code with some comments first.

/// Destroys the tree if it's empty, otherwise aborts.
public fun destroy_empty<K: store, V: store>(self: BTreeMap<K, V>) {
let BTreeMap::V1 { nodes, root_index, min_leaf_index: _, max_leaf_index: _, inner_max_degree: _, leaf_max_degree: _ } = self;
aptos_std::debug::print(&nodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

enum SlotsStorage<T: store> has store {
Simple {
slots: Table<u64, Link<T>>,
new_slot_index: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the non-reuse version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one app does all inserts and removals, they might want to get the refunds back, instead of worrying about amortization

/// Returns a newly allocated vector containing the elements in the range [at, len).
/// After the call, the original vector will be left containing the elements [0, at)
/// with its previous capacity unchanged.
public fun split_off<Element>(self: &mut vector<Element>, at: u64): vector<Element> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a native functions, so does the next one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in the stack

@@ -87,7 +87,7 @@ module aptos_std::table {
drop_unchecked_box<K, V, Box<V>>(self)
}

public(friend) fun destroy<K: copy + drop, V>(self: Table<K, V>) {
public fun destroy_empty<K: copy + drop, V>(self: Table<K, V>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this function does NOT check the Table is empty...
Could we confirm with @wrwg ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way to check if a table is empty or not, that's why we didn't expose destroy_empty as a public function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/// An iterator to iterate all keys in the BTreeMap.
enum Iterator<K> has copy, drop {
End,
Some {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: None and Next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important it is End.

}

// Returns true iff the iterator is an end iterator.
public fun is_end_iter<K: store, V: store>(_tree: &BTreeMap<K, V>, iter: &Iterator<K>): bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels more like inline func than actual func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot be done publicly, as enum variants are not public. also some implementation might actually need to do something here.

@@ -87,7 +87,7 @@ module aptos_std::table {
drop_unchecked_box<K, V, Box<V>>(self)
}

public(friend) fun destroy<K: copy + drop, V>(self: Table<K, V>) {
public fun destroy_empty<K: copy + drop, V>(self: Table<K, V>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no way to check if a table is empty or not, that's why we didn't expose destroy_empty as a public function

Inner {
// The max key of its child, or the key of the current node if it is a leaf node.
max_key: K,
// The node index of it's child, or NULL_INDEX if the current node is a leaf node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum is somehow confusing to me. Could you draw a ascii diagram to give an example?

NULL_INDEX if the current node is a leaf node

If this is a leaf node, it should be Leaf, right?

public fun new_with_config<K: store, V: store>(inner_max_degree: u16, leaf_max_degree: u16, reuse_slots: bool, num_to_preallocate: u64): BTreeMap<K, V> {
assert!(inner_max_degree == 0 || inner_max_degree >= DEFAULT_INNER_MIN_DEGREE, E_INVALID_PARAMETER);
assert!(leaf_max_degree == 0 || leaf_max_degree >= DEFAULT_LEAF_MIN_DEGREE, E_INVALID_PARAMETER);
let nodes = if (reuse_slots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we don't want to reuse slots?

Comment on lines 562 to 567
let new_node_children = children.split_off(target_size - 1);
children.insert(l, child);
new_node_children
} else {
children.insert(l, child);
children.split_off(target_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need two cases? Just

children.insert(l, child);
children.split_off(target_size)

is good. unless it's for perf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was for perf. since it is only on rebalancing, cleaned it up.

let BTreeMap::V1 { nodes, root_index, min_leaf_index: _, max_leaf_index: _, inner_max_degree: _, leaf_max_degree: _ } = self;
aptos_std::debug::print(&nodes);
nodes.remove(root_index).destroy_empty_node();
nodes.destroy_empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, how to gc?

self.nodes.borrow_mut(*prev).next = left_node_index;
};

if (!*is_leaf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments... update the non-leaf children parents pointer to the left node, which was the original node with new index.

let left_node_slot = self.nodes.create_transient_slot();
let left_node_index = left_node_slot.get_index();
right_node.next = *next;
*next = node_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment
// node_index == right_node_index;

};

// # of children in the current node exceeds the threshold, need to split into two nodes.
let (right_node_slot, node) = self.nodes.transiently_remove(node_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not reuse this node as the left node?

// The brother node has enough elements, borrow an element from the brother node.
brother_size = brother_size - 1;
if (brother_index == next) {
let borrowed_element = brother_children.remove(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a native remove too.

};
let borrowed_max_key = borrowed_element.max_key;
children.push_back(borrowed_element);
current_size = current_size + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary? just make current_size - 1 at the following line?

@igor-aptos igor-aptos changed the base branch from igor/vector_utilities to main October 3, 2024 20:20
@igor-aptos igor-aptos changed the title [move][stdlib] Add efficient BTreeMap implementation [move][stdlib] Add efficient BigOrderedMap implementation Oct 4, 2024
@igor-aptos igor-aptos changed the base branch from main to igor/ordered_map October 4, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants