75 lines
5.6 KiB
TeX
75 lines
5.6 KiB
TeX
\chapter{Self-Balancing Search Trees}\label{chap:sb-bst}
|
|
|
|
This chapter will briefly discuss the properties and fundamental ideas behind the most used self-balancing search trees in standard libraries to give an idea about current options and how WAVL fits among them.
|
|
|
|
\section{Red-black trees}
|
|
|
|
As mentioned previously, red-black trees are among the most popular implementations in standard libraries. As always, we have a binary search tree, and each node is given \textit{red} or \textit{black} colour. A red-black tree is kept balanced by enforcing the following set of rules~\cite{rbtree}:
|
|
|
|
\begin{enumerate}
|
|
\item External nodes are black; internal nodes may be red or black.
|
|
\item For each internal node, all paths from it to external nodes contain the same number of black nodes.
|
|
\item No path from an internal node to an external node contains two red nodes in a row.
|
|
\item External nodes do not hold any data.
|
|
\item Root has black colour.~\footnote{This rule is optional, since it increases the count of black nodes from root to each of the external nodes. However it may be beneficial during insertion, e.g. 2 insertions into empty red-black tree.}
|
|
\end{enumerate}
|
|
|
|
Given this knowledge, we can safely deduce the following relation between the height of the red-black tree and nodes stored in it~\cite{cormen2009introduction}:
|
|
\[
|
|
\log_2{(n + 1)} \leq h \leq 2 \cdot \log_2{(n + 2)} - 2
|
|
\]\label{rb-height}
|
|
|
|
Lower bound is given by a perfect binary tree and the upper bound is given by the minimal red-black tree.
|
|
|
|
There are also other variants of the red-black tree that are considered to be simpler for implementation, e.g. left-leaning red-black tree, as described by \textit{Sedgewick}~\cite{llrb}.
|
|
|
|
Red-black trees are used to implement sets in C++~\footnote{\url{https://github.com/llvm/llvm-project/blob/main/libcxx/include/__tree} as of commit \texttt{990ea39}}, Java~\footnote{\url{https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/TreeMap.java} as of commit \texttt{fb469fb}} and C\#~\footnote{\url{https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections/src/System/Collections/Generic/SortedSet.cs} as of commit \texttt{215b39a}}.
|
|
|
|
\section{AVL tree}
|
|
|
|
AVL tree is considered to be the eldest self-balancing binary search tree. For clarity, we define the following function:
|
|
|
|
\[
|
|
BalanceFactor(n) := height(right(n)) - height(left(n))
|
|
\]
|
|
|
|
Then we have an AVL tree, if for every node $n$ in the tree the following holds:
|
|
|
|
\[
|
|
BalanceFactor(n) \in \{ -1, 0, 1 \}
|
|
\]
|
|
|
|
In other words, the heights of left and right subtrees of each node differ at most in 1.~\cite{avl}
|
|
|
|
Similarly, we will deduce the height of the AVL tree from original paper, by \textit{Adelson-Velsky and Landis}~\cite{avl}, we get:
|
|
|
|
\[
|
|
\left( \log_2{(n + 1)} \leq \right) h < \log_{\varphi}{(n + 1)} < \frac{3}{2} \cdot \log_2{(n + 1)}
|
|
\]\label{avl-height}
|
|
|
|
If we compare the upper bounds for the height of the red-black trees and AVL trees, we can see that AVL rules are more strict than red-black rules, but at the cost of rebalancing. However, in both cases the rebalancing still takes $\log_2{n}$.
|
|
|
|
Regarding the implementation of AVL trees, we can see them implemented in the standard library of Agda~\footnote{\url{https://github.com/agda/agda-stdlib/blob/master/src/Data/Tree/AVL.agda} as of commit \texttt{d0841d1}}, OCaml~\footnote{\url{https://github.com/ocaml/ocaml/blob/trunk/stdlib/set.ml} as of commit \texttt{f52fdc2}} or Coq~\footnote{\url{https://github.com/coq/coq/blob/master/theories/MSets/MSetAVL.v} as of commit \texttt{c1ddf13}; they also admit to follow the OCaml implementation}.
|
|
|
|
\section {B-tree}
|
|
|
|
B-tree is a self-balancing tree as described by \textit{Bayer and McCreight}~\cite{btree}. We will list the rules given by \textit{Cormen et al.}~\cite{cormen2009introduction}:
|
|
\begin{enumerate}
|
|
\item Every node has attributes holding current number of keys it stores $n$, $n$ keys stored in non-decreasing order and optionally boolean value determining whether node is leaf or not.
|
|
\item Each internal node holds $n + 1$ pointers to children. Leaf nodes have no children.
|
|
\item Keys in internal nodes separate the ranges of keys that can be stored in the subtrees. If $k_i$ is any key stored in the subtree with root $x.c_i$, then
|
|
\[ k_1 \leq x.key_1 \leq k_2 \leq x.key_2 \leq \dots \leq x.key_n \leq k_{n + 1} \]
|
|
\item All leaves are in the same depth.
|
|
\item Each B-tree has a fixed integer $t \geq 2$ called \textit{minimum degree}, which gives bounds to number of keys that can be stored in a node. If $k$ is number of keys, then
|
|
\[ t - 1 \leq k \leq 2t - 1 \]
|
|
Minimal number of keys stored in a node does not apply to the root node.
|
|
\end{enumerate}
|
|
|
|
We have chosen the rules from \textit{Introduction to Algorithms}~\cite{cormen2009introduction}, because the terminology is more familiar and they are more compact than the rules given in the original paper by \textit{Bayer and McCreight}~\cite{btree}, where they introduce B-trees for organization of files in filesystems as the title suggests.
|
|
|
|
Based on the original paper~\cite{btree} and \textit{Introduction to Algorithms}~\cite{cormen2009introduction}, we have deduced following height boundaries:
|
|
\[ 1 + \log_{t}{ \frac{(t - 1)(n - 1)}{2} + 1} \leq h \leq \log_t{\frac{n + 1}{2}} \]
|
|
|
|
%% TODO: ↑ Check this thing once again ↑
|
|
|
|
Even though original paper has presented B-tree to be used in filesystems or databases, there are also cases when B-tree is used to represent sets, for example in Rust~\footnote{\url{https://github.com/rust-lang/rust/blob/master/library/alloc/src/collections/btree/map.rs} as of commit \texttt{9805437}; $t = 6$}.
|