This repository has been archived on 2022-05-18. You can view files and clone it, but cannot push or open issues or pull requests.
thesis/rank_balanced_trees.tex

252 lines
18 KiB
TeX
Raw Normal View History

\chapter{Rank-Balanced Trees}\label{chap:rank-balanced-trees}
In comparison to nodes in binary search trees, nodes in rank-balanced trees contain one more piece of information, and that is \textit{rank}. Each type of tree that can be implemented using this representation, e.g. red-black, 2-3-4, AVL or WAVL, has a specific set of rules that ensure the resulting tree is balanced.
\section{Terminology related to rank-balanced trees}
In the text and pseudocode we adopt these functions or properties~\cite{wavl}:
\begin{itemize}
\item function $r(x)$ or property $x.rank$ that returns rank of a node; in case of $r(x)$ there is a special case: $r(nil) = -1$
\item function $parent(x)$ or property $x.parent$ returns parent of a node; analogically for the left and right children of a node
\item \textit{rank-difference} of \textit{x} is defined as $r(parent(x)) - r(x)$
\item $x$ is an \textit{i-child} if its rank-difference is $i$
\item $x$ is an $(i, j)$-node if its left and right children have $i$ and $j$ rank-differences respectively; ordering of the children does not matter
\end{itemize}
\section{Rules for other trees}
As we have mentioned at the beginning of \hyperref[chap:rank-balanced-trees]{this chapter}, it is possible to implement different kinds of self-balancing binary search trees via different rules for ranks.
\subsection{AVL tree}\label{chap:avl-rule}
\textbf{AVL Rule}: Every node is (1, 1) or (1, 2).~\cite{wavl}
In case of the AVL tree, rank represents height. Here we can notice a very smart way of using the \textit{(i, j)-node} definition. If we go back to the definition and want to be explicit about the nodes that are allowed with the \textit{AVL Rule}, then we get (1, 1), (1, 2) \textbf{or} (2, 1) nodes. However it is possible to find implementations of the AVL tree that allow leaning \textbf{to only one side} as opposed to the original requirements given by \textit{Adelson-Velsky and Landis}~\cite{avl}. Forbidding interchangeability of (i, j) with (j, i)-nodes would still yield AVL trees that lean to one side.
Meaning of the \textit{AVL Rule} is quite simple, since rank represents height in that case. We can draw analogies using the notation used for the AVL trees, where we mark nodes with a trit (or a sign) or use a balance-factor. We have two cases to discuss:
\begin{itemize}
\item \textbf{(1, 1) node} represents a tree where both of its subtrees have the same height. In this case we are talking about the nodes with balance-factor $0$ (respectively being signed with a $0$).
\item \textbf{(1, 2) node} represents a tree where one of its subtrees has a bigger height. In this case we are talking about the nodes with balance-factor $-1$ or $1$ (respectively being signed with a $-$ or a $+$).
\end{itemize}
Example of the AVL tree that uses ranks instead of signs or balance-factors can be seen in \autoref{fig:ranked:avl}.
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=3+ rank=3}) at (140.6bp,279.0bp) [draw,ellipse] {3, 3};
\node (Node{value=1+ rank=1}) at (103.6bp,192.0bp) [draw,ellipse] {1, 1};
\node (Node{value=7+ rank=2}) at (178.6bp,192.0bp) [draw,ellipse] {7, 2};
\node (Node{value=0+ rank=0}) at (28.597bp,105.0bp) [draw,ellipse] {0, 0};
\node (Node{value=2+ rank=0}) at (103.6bp,105.0bp) [draw,ellipse] {2, 0};
\node (Node{value=5+ rank=1}) at (178.6bp,105.0bp) [draw,ellipse] {5, 1};
\node (Node{value=8+ rank=1}) at (253.6bp,105.0bp) [draw,ellipse] {8, 1};
\node (Node{value=4+ rank=0}) at (103.6bp,18.0bp) [draw,ellipse] {4, 0};
\node (Node{value=6+ rank=0}) at (178.6bp,18.0bp) [draw,ellipse] {6, 0};
\node (Node{value=9+ rank=0}) at (253.6bp,18.0bp) [draw,ellipse] {9, 0};
\draw [->] (Node{value=3+ rank=3}) ..controls (128.03bp,249.14bp) and (120.85bp,232.64bp) .. (Node{value=1+ rank=1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (128.6bp,235.5bp) node {2};
\draw [->] (Node{value=3+ rank=3}) ..controls (153.5bp,249.14bp) and (160.88bp,232.64bp) .. (Node{value=7+ rank=2});
\draw (166.6bp,235.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (78.671bp,162.75bp) and (61.893bp,143.74bp) .. (Node{value=0+ rank=0});
\draw (75.597bp,148.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (103.6bp,162.16bp) and (103.6bp,146.55bp) .. (Node{value=2+ rank=0});
\draw (108.6bp,148.5bp) node {1};
\draw [->] (Node{value=7+ rank=2}) ..controls (178.6bp,162.16bp) and (178.6bp,146.55bp) .. (Node{value=5+ rank=1});
\draw (183.6bp,148.5bp) node {1};
\draw [->] (Node{value=7+ rank=2}) ..controls (203.52bp,162.75bp) and (220.3bp,143.74bp) .. (Node{value=8+ rank=1});
\draw (224.6bp,148.5bp) node {1};
\draw [->] (Node{value=5+ rank=1}) ..controls (153.67bp,75.75bp) and (136.89bp,56.735bp) .. (Node{value=4+ rank=0});
\draw (149.6bp,61.5bp) node {1};
\draw [->] (Node{value=5+ rank=1}) ..controls (178.6bp,75.163bp) and (178.6bp,59.548bp) .. (Node{value=6+ rank=0});
\draw (183.6bp,61.5bp) node {1};
\draw [->] (Node{value=8+ rank=1}) ..controls (253.6bp,75.163bp) and (253.6bp,59.548bp) .. (Node{value=9+ rank=0});
\draw (258.6bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{Example of the AVL tree using ranks.}
\label{fig:ranked:avl}
\end{figure}
\subsection{Red-black tree}\label{chap:rb-rule}
\textbf{Red-Black Rule}: All rank differences are 0 or 1, and no parent of a 0-child is a 0-child.~\cite{wavl}
In case of red-black trees, rank represents number of black nodes on a path from the node to a leaf (excluding the node itself). Based on that we can discuss the \textit{Red-Black Rule} in detail:
\begin{enumerate}
\item \textit{All rank differences are 0 or 1} inductively enforces monotonically non-decreasing (at most by 1) count of black nodes from the leaves. In detail:
\begin{enumerate}
\item In case the \textbf{current node is black}, the rank difference must be 1, since we have one more black node on the path from the parent to the leaves than from the current node.
\item In case the \textbf{current node is red}, the rank difference must be 0, since from the parent the count of black nodes on the path to leaves has not changed.
\item And finally all other differences are invalid, since by adding a node to the beginning of a path to the leaf we can either add red node (0-child) or black node (1-child), i.e. there is one more black node on the path or not which implies the change can be only 0 or 1.
\end{enumerate}
\item \textit{No parent of a 0-child is a 0-child} ensures that there are no two consecutive red nodes, since 0-child node is equivalent to the red node.
\end{enumerate}
Example of the red-black tree that uses ranks instead of colors can be seen in \autoref{fig:ranked:rbt}, red nodes are also colored for the convenience.
Majority of the red-black tree implementations color nodes of the tree, following that notation and \textbf{precise} definition of the red-black tree it is quite common to ask the following questions:
\begin{enumerate}
\item \textit{Do we count the node itself if it is black?} \\
If we do not count nodes themselves, we decrease the count of black nodes on every path to the external nodes by $1$.
\item \textit{Do we count the external nodes (leaves that do not hold any value)?} \\
If we do not count external nodes themselves, we decrease the count of black nodes on every path to the external nodes by $1$.
\end{enumerate}
Overall they do not really matter, as long as they are used consistently, since they affect the counts globally.
However it is also possible to color edges instead of the nodes as is presented in \textit{Průvodce labyrintem algoritmů} by \textit{Mareš and Valla}.~\cite{labyrint} In this representation color of the edge represents color of the child node. This representation is much more „natural“ for the representation using rank as it can be seen in \autoref{fig:ranked:rbt}, where edges connecting nodes with rank-difference $1$ represent \textit{black edges} and edges connecting nodes with rank-difference $0$ represent \textit{red edges}. It is also apparent that using this representation root of the tree does not hold any color anymore.
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=3+ rank=2}) at (140.6bp,366.0bp) [draw,ellipse] {3, 2};
%%
\node (Node{value=3+ rank=2}) at (140.6bp,366.0bp) [draw,ellipse] {3, 2};
\node (Node{value=1+ rank=1}) at (103.6bp,279.0bp) [draw,ellipse] {1, 1};
\node (Node{value=5+ rank=1}) at (178.6bp,279.0bp) [draw,ellipse] {5, 1};
\node (Node{value=0+ rank=0}) at (28.597bp,192.0bp) [draw,ellipse] {0, 0};
\node (Node{value=2+ rank=0}) at (103.6bp,192.0bp) [draw,ellipse] {2, 0};
\node (Node{value=4+ rank=0}) at (178.6bp,192.0bp) [draw,ellipse] {4, 0};
\node (Node{value=7+ rank=1}) at (253.6bp,192.0bp) [draw=red,ellipse] {7, 1};
\node (Node{value=6+ rank=0}) at (226.6bp,105.0bp) [draw,ellipse] {6, 0};
\node (Node{value=8+ rank=0}) at (301.6bp,105.0bp) [draw,ellipse] {8, 0};
\node (Node{value=9+ rank=0}) at (301.6bp,18.0bp) [draw=red,ellipse] {9, 0};
\draw [->] (Node{value=3+ rank=2}) ..controls (128.03bp,336.14bp) and (120.85bp,319.64bp) .. (Node{value=1+ rank=1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (129.6bp,322.5bp) node {1};
\draw [->] (Node{value=3+ rank=2}) ..controls (153.5bp,336.14bp) and (160.88bp,319.64bp) .. (Node{value=5+ rank=1});
\draw (167.6bp,322.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (78.671bp,249.75bp) and (61.893bp,230.74bp) .. (Node{value=0+ rank=0});
\draw (75.597bp,235.5bp) node {1};
\draw [->] (Node{value=1+ rank=1}) ..controls (103.6bp,249.16bp) and (103.6bp,233.55bp) .. (Node{value=2+ rank=0});
\draw (108.6bp,235.5bp) node {1};
\draw [->] (Node{value=7+ rank=1}) ..controls (244.49bp,162.33bp) and (239.36bp,146.17bp) .. (Node{value=6+ rank=0});
\draw (246.6bp,148.5bp) node {1};
\draw [->] (Node{value=7+ rank=1}) ..controls (269.82bp,162.26bp) and (279.51bp,145.12bp) .. (Node{value=8+ rank=0});
\draw (284.6bp,148.5bp) node {1};
\draw [->] (Node{value=5+ rank=1}) ..controls (178.6bp,249.16bp) and (178.6bp,233.55bp) .. (Node{value=4+ rank=0});
\draw (183.6bp,235.5bp) node {1};
\draw [red,->] (Node{value=5+ rank=1}) ..controls (203.52bp,249.75bp) and (220.3bp,230.74bp) .. (Node{value=7+ rank=1});
\draw (225.6bp,235.5bp) node {0};
\draw [red,->] (Node{value=8+ rank=0}) ..controls (301.6bp,75.163bp) and (301.6bp,59.548bp) .. (Node{value=9+ rank=0});
\draw (306.6bp,61.5bp) node {0};
%
\end{tikzpicture}
\caption{Example of the red-black tree using ranks.}
\label{fig:ranked:rbt}
\end{figure}
\section{Implementation of other balanced trees using rank}
To show that using rank is mostly an implementation detail, we will describe an implementation of the AVL tree using rank.
Implementation of the insertion is trivial, since it is described by \textit{Haeupler et al.}~\cite{wavl} and is used in the WAVL tree. All we need to implement is the deletion from the AVL tree. We will start with a short description of the deletion rebalance as given by \textit{Mareš and Valla} in \textit{Průvodce labyrintem algoritmů}.
When propagating the error, we can encounter 3 cases (we explain them with respect to propagating deletion from the left subtree, propagation from right is mirrored and role of trits $+$ and $-$ swaps)~\cite{labyrint}:
\begin{enumerate}
\item \textit{Node was marked with $-$.} In this case, heights of left and right subtrees are equal now and node is marked with $0$, but propagation must be continued, since the height of the whole subtree has changed.\label{avl:rules:delete:1}
\item \textit{Node was marked with $0$.} In this case, node is marked with $+$ and the height of the subtree has not changed, therefore we can stop the propagation.\label{avl:rules:delete:2}
\item \textit{Node was marked with $+$.} In this case, node would acquire balance-factor of $+2$, which is not allowed. In this situation we decide based on the mark of the node from which we are propagating the deletion in the following way (let $x$ the current node marked with $+$ and $y$ be the right child of $x$):\label{avl:rules:delete:3}
\begin{enumerate}
\item If $y$ is marked with $+$, then we rotate by $x$ to the left. After that both $x$ and $y$ can be marked with $0$. Height from the point of the parent has changed, so we continue the propagation.\label{avl:rules:delete:3a}
\item If $y$ is marked with $0$, then we rotate by $x$ to the left. After the rotation, $x$ can be marked with $+$ and $y$ with $-$. Height of the subtree has not changed, so propagation can be stopped.\label{avl:rules:delete:3b}
\item $y$ is marked with $-$. Let $z$ be the left son of $y$. We double rotate: first by $z$ to the right and then by $x$ to the left. After the double-rotation $x$ can be marked by either $0$ or $-$, $y$ by $0$ or $+$ and $z$ gets $0$. Height of the subtree has changed, therefore we must propagate further.\label{avl:rules:delete:3c}
\end{enumerate}
\end{enumerate}\label{avl:rules:delete}
Having knowledge about rules we have implemented the deletion rebalance by implementing the following functions:
\begin{enumerate}
\item \avlDeleteRebalance{} that handles updating the current node and its parent and iteratively calls subroutine handling previously described \textit{one step of a rebalancing}.
\item \avlDeleteFixNode{} that handles one adjustment of rebalancing as described above.
\item \avlDeleteRotate{} that handles rotation and updating of ranks, if necessary.
\end{enumerate}
\begin{algorithm}
\Proc{$\texttt{deleteRebalance}(T, y, parent)$}{
\If{$y = nil \land parent = nil$}{
\Return;
}
\BlankLine
\If{$y = nil$}{
$(y, parent) \gets (parent, parent.parent)$\;
}
\BlankLine
\While{$y \neq nil \land \avlDeleteFixNode(T, y, parent)$}{
$y \gets parent$\;
\eIf{$parent \neq nil$}{
$parent \gets parent.parent$\;
}{
$parent \gets nil$\;
}
}
}
\caption{\texttt{deleteRebalance} algorithm for the AVL tree}\label{algorithm:avl:deleteRebalance}
\end{algorithm}
\texttt{deleteRebalance}, as can be seen in \autoref{algorithm:avl:deleteRebalance}, is quite straightforward. At the beginning we early return in case there is nothing to be rebalanced, which happens when deleting the last node from the tree. Then we handle a case where we are given only parent by correctly setting $y$ and $parent$. Following up on that, as long as we have a node to be checked, we call \autoref{algorithm:avl:deleteFixNode} to fix balancing of the current node. Algorithm for fixing node returns $true$ or $false$ depending on the need to propagate the height change further, which is utilized in the condition of the \texttt{while} loop.
\begin{algorithm}
\Proc{$\texttt{deleteFixNode}(T, x, parent)$}{
\uIf(\tcp*[h]{Handles \hyperref[avl:rules:delete:1]{rule 1}}){balance-factor of $x$ is $0$}{
update rank of $x$\;
\Return{$true$}\;
}
\ElseIf(\tcp*[h]{Handles \hyperref[avl:rules:delete:2]{rule 2}}){balance-factor of $x$ is $-1$ or $1$}{
\Return{$false$}\;
}
\BlankLine
$(l, r) \gets (x.left, x.right)$\;
$(rotateL, rotateR) \gets (\texttt{rotateLeft}, \texttt{rotateRight})$\;
\BlankLine
\tcp{Handles \hyperref[avl:rules:delete:3]{rule 3}}
\eIf{balance-factor of $x$ is $2$}{
\Return{$\avlDeleteRotate(T, x, r, 1, rotateL, rotateR)$}\;
}{
\Return{$\avlDeleteRotate(T, x, l, -1, rotateR, rotateL)$}\;
}
}
\caption{\texttt{deleteFixNode} algorithm for the AVL tree}\label{algorithm:avl:deleteFixNode}
\end{algorithm}
\texttt{deleteFixNode} implements the algorithm as described in \hyperref[avl:rules:delete]{the list} with all possible cases above. We start by checking the balance-factor of the given node, in case there is no need to rotate, the rank gets updated if necessary and then we return the information whether there is a need to propagate further or not. In case the node has acquired balance-factor of $2$ we call \autoref{algorithm:avl:deleteRotate} to fix the balancing locally.
There are two operations that are not described using helper functions and they are done in a following way:
\begin{itemize}
\item Balance-factor of a node $x$ is calculated as \[ rank(x.right) - rank(x.left) \]
\item Updating rank of a node $x$ is done by setting node's rank to \[ 1 + \max \{ rank(x.left), rank(x.right) \} \]
\end{itemize}
\begin{algorithm}
\Proc{$\texttt{deleteRotate}(T, x, y, leaning, rotateL, rotateR)$}{
$f \gets $ balance-factor of $y$\;
\BlankLine
\uIf(\tcp*[h]{Handles rules \hyperref[avl:rules:delete:3a]{3a} \& \hyperref[avl:rules:delete:3b]{3b}}){$f = 0 \lor f = leaning$}{
$rotateL(T, x)$\;
}
\Else(\tcp*[h]{Handles \hyperref[avl:rules:delete:3c]{rule 3c}}){
$rotateR(T, y)$\;
$rotateL(T, x)$\;
}
\BlankLine
update ranks of $x$, $y$ and new root of the subtree\;
\BlankLine
\Return{$f \neq 0$}\;
}
\caption{\texttt{deleteRotate} algorithm for the AVL tree}\label{algorithm:avl:deleteRotate}
\end{algorithm}
\newpage
\texttt{deleteRotate} is handling only fixes where the rotations are required. Both \autoref{algorithm:avl:deleteFixNode} and \autoref{algorithm:avl:deleteRotate} include comments to highlight which rules are handled. This function is also done generically regardless of the subtree from which the height change is being propagated. This is done by passing in functions used for rotations (since it is mirrored) and also by passing in the balance-factor required for just one rotation.
In both \autoref{algorithm:avl:deleteFixNode} and \autoref{algorithm:avl:deleteRotate} there is a key difference compared to the AVL tree implementations without ranks. Comparing the \hyperref[avl:rules:delete]{rules for deletion} with algorithms for rank-balanced implementation, it is apparent that during propagation of height change, the balance-factors of immediate nodes are already adjusted, since the information comes from either of its subtrees and it is calculated using ranks of its children that are already adjusted. This fact needs to be reflected in the implementation accordingly, since it shifts the meaning of rules as they are described above and written for the implementations that store the trit in the nodes directly, which is updated manually during rebalancing.