AVL Tree

The AVL tree is a self-balancing binary tree data structure. In an AVL tree, the left and right subtrees have a height difference of at most one. The height of a tree is the height of its tallest subtree, plus one. To maintain balance, each insert is followed by a rebalancing, involving a left/right single rotation or a double rotation at the point of insertion.

Time complexity for sequential operations

Insertion: amortized O(log n), worst case O(log n).
Search: amortized O(log n), worst case O(log n).

Sequential operations

Provided are the sequential operations to create a new tree, insert elements, and search for elements.

Batch parallel implementation

Batch parallel insertions

With a priori known insertion operations, we are able to parallelise them to achieve better performance. This hinges on two new functions:

join lt n rt: this function joins a left AVL tree lt, a node n with some key k, and a right AVL tree rt. The end result should be a balanced AVL tree. This function assumes all keys in lt are strictly less than k, and all keys in rt are strictly greater than k. Under these constraints, join has O(log n) complexity (where n is the combined number of nodes of both trees).
split t k: this function splits an AVL tree t at key k into 2 AVL trees, one with all keys less than k and the other with all keys greater than k. If there is a node present with that same key k, that node is returned detached from either tree. The complexity of split is also O(log n).

The proofs for the complexity of these functions are detailed in this paper: https://arxiv.org/abs/1602.02120. Unit tests for these two functions are also provided and can be run with dune runtest.

By being able to efficiently split and join AVL trees, we can effectively divide the tree into non-overlapping parts over which different batches of insertions can be run in parallel. We propose a few approaches to doing this:

Split the batch of operations in sub-batches containing roughly the same number of insert operations and split the tree as many time as needed. Run each batch on its own tree in parallel.
Split the batch of operations at each node with binary search using the current node key as pivot.
Split the batch of operations at each node with linear search using the current node key as pivot.
Split the batch of operations at each node with a combination of binary and linear search using the current node key as pivot.

Note that the above assume a sorted array of insert operations. The par_insert function sorts the batch on insertions it gets before doing anything else.

Batch parallel searches

Since searches do not modify the data structures, we have more methods at our disposal to parallelise a batch of search queries:

Parallelise every query, starting each one at the root node.
Segment query batch into multiple equal smaller batches when above a threshold, starting each one at the root node.
Split query batch at each node with binary search using the current node key as pivot.
Split query batch at each node with linear search using the current node key as pivot.
Split query batch at each node with a combination of binary and linear search using the current node key as pivot.

Provide feedback

Saved searches