This repository has been archived on 2022-05-18. You can view files and clone it, but cannot push or open issues or pull requests.
thesis/self_balancing_search_trees.tex
Matej Focko 4b271e8a83
fix: cite sources
Signed-off-by: Matej Focko <mfocko@redhat.com>
2022-05-18 22:59:26 +02:00

72 lines
4.8 KiB
TeX

\chapter{Self-Balancing Search Trees}\label{chap:sb-bst}
This chapter will briefly discuss the properties and fundamental ideas behind the most used self-balancing search trees in standard libraries to give an idea about current options and how WAVL fits among them.
\section{Red-black trees}
Red-black trees are among the most popular implementations in standard libraries. We have a binary search tree, and each node is given \textit{red} or \textit{black} colour. A red-black tree is kept balanced by enforcing the following set of rules~\cite{rbtree}:
\begin{enumerate}
\item External nodes are black; internal nodes may be red or black.
\item For each internal node, all paths from it to external nodes contain the same number of black nodes.
\item No path from an internal node to an external node contains two red nodes in a row.
\item External nodes do not hold any data.
\item Root has black colour.~\footnote{This rule is optional since it increases the count of black nodes from the root to each external node. However, it may be beneficial during insertion, e.g. two insertions into an empty red-black tree.}
\end{enumerate}
Given this knowledge, we can safely deduce the following relation between the height of the red-black tree and nodes stored in it~\cite{cormen2009introduction}:
\begin{equation}
\log_2{(n + 1)} \leq h \leq 2 \cdot \log_2{(n + 2)} - 2
\end{equation}~\label{rb-height}
The lower bound is given by a perfect binary tree, and the upper bound is given by the minimal red-black tree.
Other variants of the red-black tree are considered to be simpler for implementation, e.g. left-leaning red-black tree, as described by \textit{Sedgewick}~\cite{llrb}.
Red-black trees are used to implement sets in C++~\cite{llvm}, Java~\cite{jdk} and C\#~\cite{csharp}.
\section{AVL tree}
AVL tree is considered to be the eldest self-balancing binary search tree. For clarity, we define the following function:
\[
BalanceFactor(n) := height(right(n)) - height(left(n))
\]
Then we have an AVL tree, if for every node $n$ in the tree the following holds:
\[
BalanceFactor(n) \in \{ -1, 0, 1 \}
\]
In other words, the heights of left and right subtrees of each node differ at most in 1.~\cite{avl}
Similarly, we will deduce the height of the AVL tree from the original paper by \textit{Adelson-Velsky and Landis}~\cite{avl} and also from \textit{The Art of Computer Programming} by \textit{Knuth}~\cite{knuth1998art}:
\begin{equation}
\left( \log_2{(n + 1)} \leq \right) h < \log_{\varphi}{(n + 1)} < \frac{3}{2} \cdot \log_2{(n + 1)}
\end{equation}~\label{avl-height}
If we compare the upper bounds for the height of the red-black trees and AVL trees, we can see that AVL rules are more strict than red-black rules, but at the cost of rebalancing. However, in both cases, the rebalancing still takes $\log_2{n}$.
Regarding the implementation of AVL trees, we can see them implemented in the standard library of Agda~\cite{agda}, OCaml~\cite{ocaml} or Coq~\footnote{They also admit to follow the OCaml implementation.}~\cite{coq}.
\section {B-tree}
B-tree is a self-balancing tree as described by \textit{Bayer and McCreight}~\cite{btree}. We will list the rules given by \textit{Cormen et al.}~\cite{cormen2009introduction}:
\begin{enumerate}
\item Every node has attributes holding current number of keys it stores $n$, $n$ keys stored in non-decreasing order and optionally boolean value determining whether node is leaf or not.
\item Each internal node holds $n + 1$ pointers to children. Leaf nodes have no children.
\item Keys in internal nodes separate the ranges of keys that can be stored in the subtrees. If $k_i$ is any key stored in the subtree with root $x.c_i$, then
\[ k_1 \leq x.key_1 \leq k_2 \leq x.key_2 \leq \dots \leq x.key_n \leq k_{n + 1} \]
\item All leaves are in the same depth.
\item Each B-tree has a fixed integer $t \geq 2$ called \textit{minimum degree}, which gives bounds to number of keys that can be stored in a node. If $k$ is number of keys, then
\[ t - 1 \leq k \leq 2t - 1 \]
Minimal number of keys stored in a node does not apply to the root node.
\end{enumerate}
We have chosen the rules from \textit{Introduction to Algorithms}~\cite{cormen2009introduction} because the terminology is more familiar and compact than the rules given in the original paper by \textit{Bayer and McCreight}~\cite{btree}, where they introduce B-trees for the organization of files in filesystems, as the title suggests.
Based on the original paper~\cite{btree} and \textit{Introduction to Algorithms}~\cite{cormen2009introduction}, we have deduced the following height boundaries:
\[ h \leq \log_t{\frac{n + 1}{2}} \]
Even though the original paper has presented B-tree to be used in filesystems or databases, there are also cases when B-tree is used to represent sets, for example, in Rust~\footnote{$t = 6$}~\cite{rust}.