-
Notifications
You must be signed in to change notification settings - Fork 1
AVL Tree
The AVL tree is a self-balancing binary tree data structure. In an AVL tree, the left and right subtrees have a height difference of at most one. The height of a tree is the height of its tallest subtree, plus one. To maintain balance, each insert is followed by a rebalancing, involving a left/right single rotation or a double rotation at the point of insertion.
- Insertion: amortized
O(log n)
, worst caseO(log n)
. - Search: amortized
O(log n)
, worst caseO(log n)
.
Provided are the sequential operations to create a new tree, insert elements, and search for elements.
With a priori known insertion operations, we are able to parallelise them to achieve better performance. This hinges on two new functions:
-
join lt n rt
: this function joins a left AVL treelt
, a noden
with some keyk
, and a right AVL treert
. The end result should be a balanced AVL tree. This function assumes all keys inlt
are strictly less thank
, and all keys inrt
are strictly greater thank
. Under these constraints,join
hasO(log n)
complexity (where n is the combined number of nodes of both trees). -
split t k
: this function splits an AVL treet
at keyk
into 2 AVL trees, one with all keys less thank
and the other with all keys greater thank
. If there is a node present with that same keyk
, that node is returned detached from either tree. The complexity ofsplit
is alsoO(log n)
.
The proofs for the complexity of these functions are detailed in this paper: https://arxiv.org/abs/1602.02120. Unit tests for these two functions are also provided and can be run with dune runtest
.
By being able to efficiently split and join AVL trees, we can effectively divide the tree into non-overlapping parts over which different batches of insertions can be run in parallel. We propose a few approaches to doing this:
- Split the batch of operations in sub-batches containing roughly the same number of insert operations and split the tree as many time as needed. Run each batch on its own tree in parallel (0).
- Split the batch of operations at each node with binary search using the current node key as pivot (1).
- Split the batch of operations at each node with linear search using the current node key as pivot (2).
- Split the batch of operations at each node with a combination of binary and linear search using the current node key as pivot (3).
Note that the above assume a sorted array of insert operations. The par_insert
function sorts the batch on insertions it gets before doing anything else.
Since searches do not modify the data structures, we have more methods at our disposal to parallelise a batch of search queries:
- Parallelise every query, starting each one at the root node (0).
- Segment query batch into multiple equal smaller batches when above a threshold, starting each one at the root node (1).
- Split query batch at each node with binary search using the current node key as pivot (2).
- Split query batch at each node with linear search using the current node key as pivot (3).
- Split query batch at each node with a combination of binary and linear search using the current node key as pivot (4).