This repository has been archived on 2022-05-18. You can view files and clone it, but cannot push or open issues or pull requests.
thesis/rank_balanced_trees.tex
Matej Focko ecccb2b10e
fix: typos
Signed-off-by: Matej Focko <mfocko@redhat.com>
2022-05-18 23:34:27 +02:00

250 lines
No EOL
19 KiB
TeX

\chapter{Rank-Balanced Trees}\label{chap:rank-balanced-trees}
In comparison to nodes in binary search trees, nodes in rank-balanced trees contain one more piece of information, and that is \textit{rank}. Each type of tree that can be implemented using this representation, e.g. red-black, 2-3-4, AVL or WAVL, has a specific set of rules that ensure the resulting tree is balanced.
\section{Terminology related to rank-balanced trees}
In the text and pseudocode we adopt these functions or properties~\cite{wavl}:
\begin{itemize}
\item function $r(x)$ or property $x.rank$ that returns rank of a node; in case of $r(x)$ there is a special case: $r(nil) = -1$
\item function $parent(x)$ or property $x.parent$ returns parent of a node; analogically for the left and right children of a node
\item \textit{rank-difference} of \textit{x} is defined as $r(parent(x)) - r(x)$
\item $x$ is an \textit{i-child} if its rank-difference is $i$
\item $x$ is an $(i,~j)$-node if its left and right children have $i$ and $j$ rank-differences respectively; ordering of the children does not matter
\end{itemize}
\section{Rules for other trees}
As we have mentioned at the beginning of \hyperref[chap:rank-balanced-trees]{this chapter}, it is possible to implement different kinds of self-balancing binary search trees via different rules for ranks.
\subsection{AVL tree}\label{chap:avl-rule}
\textbf{AVL Rule}: Every node is (1,~1) or (1,~2).~\cite{wavl}
In the case of the AVL tree, rank represents height. Here we can notice an ingenious way of using the \textit{(i,~j)-node} definition. If we go back to the definition and want to be explicit about the nodes that are allowed with the \textit{AVL Rule}, then we get (1,~1), (1,~2) \textbf{or} (2,~1)-nodes. However, it is possible to find implementations of the AVL tree that allow leaning \textbf{to only one side} as opposed to the original requirements given by \textit{Adelson-Velsky and Landis}~\cite{avl}. Forbidding interchangeability of (i,~j) with (j,~i)-nodes would still yield AVL trees that lean to one side.
The meaning of the \textit{AVL Rule} is quite simple since rank represents the height in that case. We can draw analogies using the notation used for the AVL trees, where we mark nodes with a trit (or a sign) or use a balance factor. We have two cases to discuss:
\begin{itemize}
\item \textbf{(1,~1)-node} represents a tree where both of its subtrees have the same height. In this case, we are talking about the nodes with balance factor $0$ (respectively being signed with a $0$).
\item \textbf{(1,~2)-node} represents a tree where one of its subtrees has a bigger height. In this case, we are talking about the nodes with balance factor $-1$ or $1$ (respectively being signed with a $-$ or a $+$).
\end{itemize}
Example of the AVL tree that uses ranks instead of signs or balance factors can be seen in \autoref{fig:ranked:avl}.
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=3+ rank=3}) at (140.6bp,279.0bp) [draw,ellipse] {3, 3};
\node (Node{value=1+ rank=1}) at (103.6bp,192.0bp) [draw,ellipse] {1, 1};
\node (Node{value=7+ rank=2}) at (178.6bp,192.0bp) [draw,ellipse] {7, 2};
\node (Node{value=0+ rank=0}) at (28.597bp,105.0bp) [draw,ellipse] {0, 0};
\node (Node{value=2+ rank=0}) at (103.6bp,105.0bp) [draw,ellipse] {2, 0};
\node (Node{value=5+ rank=1}) at (178.6bp,105.0bp) [draw,ellipse] {5, 1};
\node (Node{value=8+ rank=1}) at (253.6bp,105.0bp) [draw,ellipse] {8, 1};
\node (Node{value=4+ rank=0}) at (103.6bp,18.0bp) [draw,ellipse] {4, 0};
\node (Node{value=6+ rank=0}) at (178.6bp,18.0bp) [draw,ellipse] {6, 0};
\node (Node{value=9+ rank=0}) at (253.6bp,18.0bp) [draw,ellipse] {9, 0};
\draw [->] (Node{value=3+ rank=3}) ..controls (128.03bp,249.14bp) and (120.85bp,232.64bp) .. (Node{value=1+ rank=1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (128.6bp,235.5bp) node {2};
\draw [->] (Node{value=3+ rank=3}) ..controls (153.5bp,249.14bp) and (160.88bp,232.64bp) .. (Node{value=7+ rank=2});
\draw (166.6bp,235.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (78.671bp,162.75bp) and (61.893bp,143.74bp) .. (Node{value=0+ rank=0});
\draw (75.597bp,148.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (103.6bp,162.16bp) and (103.6bp,146.55bp) .. (Node{value=2+ rank=0});
\draw (108.6bp,148.5bp) node {1};
\draw [->] (Node{value=7+ rank=2}) ..controls (178.6bp,162.16bp) and (178.6bp,146.55bp) .. (Node{value=5+ rank=1});
\draw (183.6bp,148.5bp) node {1};
\draw [->] (Node{value=7+ rank=2}) ..controls (203.52bp,162.75bp) and (220.3bp,143.74bp) .. (Node{value=8+ rank=1});
\draw (224.6bp,148.5bp) node {1};
\draw [->] (Node{value=5+ rank=1}) ..controls (153.67bp,75.75bp) and (136.89bp,56.735bp) .. (Node{value=4+ rank=0});
\draw (149.6bp,61.5bp) node {1};
\draw [->] (Node{value=5+ rank=1}) ..controls (178.6bp,75.163bp) and (178.6bp,59.548bp) .. (Node{value=6+ rank=0});
\draw (183.6bp,61.5bp) node {1};
\draw [->] (Node{value=8+ rank=1}) ..controls (253.6bp,75.163bp) and (253.6bp,59.548bp) .. (Node{value=9+ rank=0});
\draw (258.6bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{Example of the AVL tree using ranks.}
\label{fig:ranked:avl}
\end{figure}
\subsection{Red-black tree}\label{chap:rb-rule}
\textbf{Red-Black Rule}: All rank differences are 0 or 1, and no parent of a 0-child is a 0-child.~\cite{wavl}
In the case of red-black trees, rank represents a number of black nodes on a path from the node to a leaf (excluding the node itself). Based on that, we can discuss the \textit{Red-Black Rule} in detail:
\begin{itemize}
\item \textit{All rank differences are 0 or 1} inductively enforces a monotonically non-decreasing (increases at most by 1) count of black nodes from the leaves. In detail:
\begin{itemize}
\item In case the \textbf{current node is black}, the rank difference must be 1, since we have one more black node on the path from the parent to the leaves than from the current node.
\item In case the \textbf{current node is red}, the rank difference must be 0, since from the parent, the count of black nodes on the path to leaves has not changed.
\item And finally, all other differences are invalid since adding a node to the beginning of a path to the leaf, we can either add red node (0-child) or black node (1-child), i.e. there is one more black node on the path or not, which implies the change can be only 0 or 1.
\end{itemize}
\item \textit{No parent of a 0-child is a 0-child} ensures that there are no two consecutive red nodes, since the 0-child node is equivalent to the red node.
\end{itemize}
An example of the red-black tree that uses ranks instead of colours can be seen in \autoref{fig:ranked:rbt}. Red nodes are also coloured for convenience.
Majority of the red-black tree implementations colour nodes of the tree, following that notation and \textbf{precise} definition of the red-black tree it is pretty common to ask the following questions:
\begin{enumerate}
\item \textit{Do we count the node itself if it is black?} \\
If we do not count nodes themselves, we decrease the count of black nodes on every path to the external nodes by $1$.
\item \textit{Do we count the external nodes (leaves that do not hold any value)?} \\
If we do not count the external nodes themselves, we decrease the count of black nodes on every path to the external nodes by $1$.
\end{enumerate}
Overall they do not matter, as long as they are used consistently, since they affect the counts globally.
However, it is also possible to colour edges instead of the nodes as is presented in \textit{Průvodce labyrintem algoritmů} by \textit{Mareš and Valla}.~\cite{labyrint} In this representation colour of the edge represents the colour of the child node. This representation is much more „natural“ for the representation using rank as it can be seen in \autoref{fig:ranked:rbt}, where edges connecting nodes with rank-difference $1$ represent \textit{black edges} and edges connecting nodes with rank-difference $0$ represent \textit{red edges}. It is also apparent that using this representation root of the tree does not hold any colour anymore.
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=3+ rank=2}) at (140.6bp,366.0bp) [draw,ellipse] {3, 2};
%%
\node (Node{value=3+ rank=2}) at (140.6bp,366.0bp) [draw,ellipse] {3, 2};
\node (Node{value=1+ rank=1}) at (103.6bp,279.0bp) [draw,ellipse] {1, 1};
\node (Node{value=5+ rank=1}) at (178.6bp,279.0bp) [draw,ellipse] {5, 1};
\node (Node{value=0+ rank=0}) at (28.597bp,192.0bp) [draw,ellipse] {0, 0};
\node (Node{value=2+ rank=0}) at (103.6bp,192.0bp) [draw,ellipse] {2, 0};
\node (Node{value=4+ rank=0}) at (178.6bp,192.0bp) [draw,ellipse] {4, 0};
\node (Node{value=7+ rank=1}) at (253.6bp,192.0bp) [draw=red,ellipse] {7, 1};
\node (Node{value=6+ rank=0}) at (226.6bp,105.0bp) [draw,ellipse] {6, 0};
\node (Node{value=8+ rank=0}) at (301.6bp,105.0bp) [draw,ellipse] {8, 0};
\node (Node{value=9+ rank=0}) at (301.6bp,18.0bp) [draw=red,ellipse] {9, 0};
\draw [->] (Node{value=3+ rank=2}) ..controls (128.03bp,336.14bp) and (120.85bp,319.64bp) .. (Node{value=1+ rank=1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (129.6bp,322.5bp) node {1};
\draw [->] (Node{value=3+ rank=2}) ..controls (153.5bp,336.14bp) and (160.88bp,319.64bp) .. (Node{value=5+ rank=1});
\draw (167.6bp,322.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (78.671bp,249.75bp) and (61.893bp,230.74bp) .. (Node{value=0+ rank=0});
\draw (75.597bp,235.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (103.6bp,249.16bp) and (103.6bp,233.55bp) .. (Node{value=2+ rank=0});
\draw (108.6bp,235.5bp) node {1};
\draw [->] (Node{value=7+ rank=1}) ..controls (244.49bp,162.33bp) and (239.36bp,146.17bp) .. (Node{value=6+ rank=0});
\draw (246.6bp,148.5bp) node {1};
\draw [->] (Node{value=7+ rank=1}) ..controls (269.82bp,162.26bp) and (279.51bp,145.12bp) .. (Node{value=8+ rank=0});
\draw (284.6bp,148.5bp) node {1};
\draw [->] (Node{value=5+ rank=1}) ..controls (178.6bp,249.16bp) and (178.6bp,233.55bp) .. (Node{value=4+ rank=0});
\draw (183.6bp,235.5bp) node {1};
\draw [red,->] (Node{value=5+ rank=1}) ..controls (203.52bp,249.75bp) and (220.3bp,230.74bp) .. (Node{value=7+ rank=1});
\draw (225.6bp,235.5bp) node {0};
\draw [red,->] (Node{value=8+ rank=0}) ..controls (301.6bp,75.163bp) and (301.6bp,59.548bp) .. (Node{value=9+ rank=0});
\draw (306.6bp,61.5bp) node {0};
%
\end{tikzpicture}
\caption{Example of the red-black tree using ranks.}
\label{fig:ranked:rbt}
\end{figure}
\section{Implementation of other balanced trees using rank}
To show that using rank is mostly an implementation detail, we will describe an implementation of the AVL tree using rank.
Implementation of the insertion is trivial, since it is described by \textit{Haeupler et al.}~\cite{wavl} and is used in the WAVL tree. All we need to implement is the deletion from the AVL tree. We will start with a short description of the deletion rebalance as given by \textit{Mareš and Valla} in \textit{Průvodce labyrintem algoritmů}~\cite{labyrint}.
When propagating the error, we can encounter 3 cases (we explain them for propagating deletion from the left subtree, propagation from right is mirrored, and role of trits $+$ and $-$ swaps)~\cite{labyrint}:
\begin{enumerate}
\item \textit{Node was marked with $-$.} In this case, heights of left and right subtrees are equal now, and node is marked with $0$, but propagation must be continued since the height of the whole subtree has changed.\label{avl:rules:delete:1}
\item \textit{Node was marked with $0$.} In this case, the node is marked with $+$, and the height of the subtree has not changed; therefore, we can stop the propagation.\label{avl:rules:delete:2}
\item \textit{Node was marked with $+$.} In this case, the node would acquire a balance factor of $+2$, which is not allowed. In this situation, we decide based on the mark of the node from which we are propagating the deletion in the following way (let $x$, the current node marked with $+$ and $y$, be the right child of $x$):\label{avl:rules:delete:3}
\begin{enumerate}
\item If $y$ is marked with $+$, we rotate by $x$ to the left. After that, both $x$ and $y$ can be marked with $0$. Height from the point of the parent has changed, so we continue the propagation.\label{avl:rules:delete:3a}
\item If $y$ is marked with $0$, we rotate by $x$ to the left. After the rotation, $x$ can be marked with $+$ and $y$ with $-$. The height of the subtree has not changed, and propagation can be stopped.\label{avl:rules:delete:3b}
\item $y$ is marked with $-$. Let $z$ be the left son of $y$. We double rotate by $z$ to the right and then by $x$ to the left. After the double-rotation, $x$ can be marked by either $0$ or $-$, $y$ by $0$ or $+$ and $z$ gets $0$. The height of the subtree has changed; therefore, we must propagate further.\label{avl:rules:delete:3c}
\end{enumerate}
\end{enumerate}\label{avl:rules:delete}
Knowing rules, we have implemented the deletion rebalance by implementing the following functions:
\begin{enumerate}
\item \avlDeleteRebalance{} that handles updating the current node and its parent and iteratively calls subroutine handling previously described as \textit{one step of a rebalancing}.
\item \avlDeleteFixNode{} that handles one adjustment of rebalancing as described above.
\item \avlDeleteRotate{} that handles rotation and updating of ranks, if necessary.
\end{enumerate}
\begin{algorithm}
\Proc{$\texttt{deleteRebalance}(T, y, parent)$}{
\If{$y = nil \land parent = nil$}{
\Return;
}
\BlankLine
\If{$y = nil$}{
$(y, parent) \gets (parent, parent.parent)$\;
}
\BlankLine
\While{$y \neq nil \land \avlDeleteFixNode(T, y, parent)$}{
$y \gets parent$\;
\eIf{$parent \neq nil$}{
$parent \gets parent.parent$\;
}{
$parent \gets nil$\;
}
}
}
\caption{\texttt{deleteRebalance} algorithm for the AVL tree.}\label{algorithm:avl:deleteRebalance}
\end{algorithm}
\texttt{deleteRebalance}, as can be seen in \autoref{algorithm:avl:deleteRebalance}, is relatively straightforward. In the beginning, we early return in case there is nothing to be rebalanced, which happens when deleting the last node from the tree. Then we handle a case where we are given only parent by correctly setting $y$ and $parent$. Following up on that, as long as we have a node to be checked, we call \autoref{algorithm:avl:deleteFixNode} to fix the balancing of the current node. The algorithm for fixing a node returns $true$ or $false$ depending on the need to propagate the height change further, which is utilized in the \texttt{while}-loop condition.
\begin{algorithm}
\Proc{$\texttt{deleteFixNode}(T, x, parent)$}{
\uIf(\tcp*[h]{Handles \hyperref[avl:rules:delete:1]{rule 1}}){balance-factor of $x$ is $0$}{
update rank of $x$\;
\Return{$true$}\;
}
\ElseIf(\tcp*[h]{Handles \hyperref[avl:rules:delete:2]{rule 2}}){balance-factor of $x$ is $-1$ or $1$}{
\Return{$false$}\;
}
\BlankLine
$(l, r) \gets (x.left, x.right)$\;
$(rotateL, rotateR) \gets (\texttt{rotateLeft}, \texttt{rotateRight})$\;
\BlankLine
\tcp{Handles \hyperref[avl:rules:delete:3]{rule 3}}
\eIf{balance-factor of $x$ is $2$}{
\Return{$\avlDeleteRotate(T, x, r, 1, rotateL, rotateR)$}\;
}{
\Return{$\avlDeleteRotate(T, x, l, -1, rotateR, rotateL)$}\;
}
}
\caption{\texttt{deleteFixNode} algorithm for the AVL tree.}\label{algorithm:avl:deleteFixNode}
\end{algorithm}
\texttt{deleteFixNode} implements the algorithm described in \hyperref[avl:rules:delete]{the list} with all possible cases above. We start by checking the balance factor of the given node. In case there is no need to rotate, the rank gets updated if necessary, and then we return the information on whether there is a need to propagate further or not. If the node has acquired a balance factor of $2$, we call \autoref{algorithm:avl:deleteRotate} to fix the balancing locally.
There are two operations that are not described using helper functions, and they are done in the following way:
\begin{itemize}
\item Balance-factor of a node $x$ is calculated as \[ rank(x.right) - rank(x.left) \]
\item Updating rank of a node $x$ is done by setting node's rank to \[ 1 + \max \{ rank(x.left), rank(x.right) \} \]
\end{itemize}
\begin{algorithm}
\Proc{$\texttt{deleteRotate}(T, x, y, leaning, rotateL, rotateR)$}{
$f \gets $ balance-factor of $y$\;
\BlankLine
\uIf(\tcp*[h]{Handles rules \hyperref[avl:rules:delete:3a]{3a} \& \hyperref[avl:rules:delete:3b]{3b}}){$f = 0 \lor f = leaning$}{
$rotateL(T, x)$\;
}
\Else(\tcp*[h]{Handles \hyperref[avl:rules:delete:3c]{rule 3c}}){
$rotateR(T, y)$\;
$rotateL(T, x)$\;
}
\BlankLine
update ranks of $x$, $y$ and new root of the subtree\;
\BlankLine
\Return{$f \neq 0$}\;
}
\caption{\texttt{deleteRotate} algorithm for the AVL tree.}\label{algorithm:avl:deleteRotate}
\end{algorithm}
\newpage
\texttt{deleteRotate} is handling only fixes where the rotations are required. Both \autoref{algorithm:avl:deleteFixNode} and \autoref{algorithm:avl:deleteRotate} include comments to highlight which rules are handled. This function is also done generically regardless of the subtree from which the height change is being propagated. This is done by passing in functions used for rotations (since it is mirrored) and passing in the balance factor required for just one rotation.
There is a crucial difference between \autoref{algorithm:avl:deleteFixNode} and \autoref{algorithm:avl:deleteRotate} compared to the AVL tree implementations without ranks. During propagation of the height change, the balance factors of the closest nodes are already adjusted. That is caused by calculating the balance factor based on the subtrees, not the node's rank. This fact needs to be reflected in the implementation accordingly. It changes the meaning of the rules described above since they are written for the implementations that directly store the trit in the nodes and update it manually during rebalancing.
It is rare for standard algorithms for the AVL tree to work with a balance factor. The reasoning is straightforward. The balance factor is calculated using the heights of the subtrees, which is not a cheap operation without any additional information. However, when rank represents the height of the subtree, we get that information just at the cost of keeping it correct. Every time we access the balance factor in our delete algorithm, it is done in $\mathcal{O}(1)$ as opposed to the need to calculate the heights.