This repository has been archived on 2022-05-18. You can view files and clone it, but cannot push or open issues or pull requests.
thesis/wavl_trees.tex
Matej Focko ecccb2b10e
fix: typos
Signed-off-by: Matej Focko <mfocko@redhat.com>
2022-05-18 23:34:27 +02:00

358 lines
19 KiB
TeX

\chapter{Weak AVL Trees}
\section{Rank rule}
Based on the rank rules for implementing red-black tree (as described in \ref{chap:rb-rule}) and AVL tree (as described in \ref{chap:avl-rule}), \textit{Haeupler et al.}~\cite{wavl} present a new rank rule:
\textbf{Weak AVL Rule}: All rank differences are 1 or 2, and every leaf has rank 0.~\cite{wavl}
Comparing the \textit{Weak AVL Rule} to the \textit{AVL Rule}, we can come to these conclusions:
\begin{itemize}
\item \textit{Every leaf has rank 0} holds with the AVL Rule since every node is (1,~1) or (1,~2), and the rank of a node represents the height of its tree. The rank of \textit{nil} is defined as $-1$, and the height of the tree rooted at the leaf is $0$; therefore, leaves are (1,~1)-nodes.
\item \textit{All rank differences are 1 or 2} does not hold in one specific case, and that is (2,~2)-node, which is allowed in the WAVL tree, but not in the AVL tree. This difference will be explained more thoroughly later on.
\end{itemize}
\section{Height boundaries}
We have described in \autoref{chap:sb-bst} common self-balanced binary search trees to be able to draw analogies and explain the differences between them. Given the boundaries of height for the red-black and AVL trees, we can safely assume that the AVL is more strict regarding the self-balancing than the red-black tree. Let us show how does WAVL fit among them. \textit{Haeupler et al.}~\cite{wavl} present the following bounds~\cite{wavl}:
\begin{equation}
h \leq k \leq 2h \text{ and } k \leq 2 \log_2{n}
\end{equation}
In those equations, we can see $h$ and $n$ in the same context as we used to lay boundaries for the AVL and red-black trees, but we can also see a new variable, $k$, which represents the rank of the tree.
One of the core differences between AVL and WAVL lies in the rebalancing after deletion. Insertion into the WAVL tree is realized in the same way as it would in the AVL tree, and the benefit of the (2,~2)-node is used during deletion rebalancing.
From the previous 2 statements, we can come to 2 conclusions, and those are:
\begin{itemize}
\item If we commit only insertions to the WAVL tree, it will always yield a valid AVL tree. In that case, the height boundaries are the same as of the AVL tree (described in \autoref{avl-height}).
\item If we also commit deletions, we can assume the worst-case scenario where \[ h \leq 2 \log_2{n} \]
This scenario is close to the upper bound of the height for the red-black trees (described in \autoref{rb-height}).
\end{itemize}
From the two conclusions, we can safely deduce that the WAVL tree is in the worst-case scenario as efficient as the red-black tree and the best-case scenario as efficient as the AVL tree.
\section{Insertion into the weak AVL tree}
Inserting values into the WAVL tree is equivalent to inserting values into a regular binary search tree, followed by rebalancing that ensures rank rules hold. This part can be seen in \autoref{algorithm:wavl:insert}. We can also see there are two early returns. One of them happens during insertion into the empty tree and the other during insertion of a duplicate key, which we do not allow.
\begin{algorithm}
\Proc{$\texttt{insert}(T, key)$}{
$insertedNode \gets Node(key)$\;
\If{$T.root = nil$}{
$T.root \gets insertedNode$\;
\Return\;
}
\BlankLine
$parent \gets \findParentNode(key, T.root)$\;
\If{$parent = nil$}{
\Return\;
}
$insertedNode.parent \gets parent$\;
\BlankLine
\eIf{$key < parent.key$}{
$parent.left \gets insertedNode$\;
}{
$parent.right \gets insertedNode$\;
}
\BlankLine
$\wavlInsertRebalance(T, insertedNode)$\;
}
\caption{Insert operation on binary search tree.}\label{algorithm:wavl:insert}
\end{algorithm}
In \autoref{algorithm:wavl:insert} we have also utilized a helper function to find the parent of the newly inserted node and prevent the insertion of duplicate keys within the tree. The pseudocode of that function can be seen in \autoref{algorithm:findParentNode}.
\begin{algorithm}
\Fn{$\texttt{findParentNode}(key, node)$}{
$childNode \gets node$\;
\BlankLine
\While{$childNode \neq nil$}{
$node \gets childNode$\;
\uIf{$key < node.key$}{
$childNode \gets node.left$\;
}
\ElseIf{$node.key < key$}{
$childNode \gets node.right$\;
}
\Else{
\Return{nil}\;
}
}
\BlankLine
\Return{node}\;
}
\caption{Helper function that returns parent for newly inserted node.}\label{algorithm:findParentNode}
\end{algorithm}
Rebalancing after insertion in the WAVL tree is equivalent to rebalancing after insertion in the AVL tree. We will start with a short description of the rebalancing within AVL to lay a foundation for analogies and differences compared to the implementation using ranks.
When propagating the error, we can encounter 3 cases (we explain them with respect to propagating insertion from the left subtree, propagation from right is mirrored and role of trits $+$ and $-$ swaps)~\cite{labyrint}:
\begin{enumerate}
\item \textit{Node was marked with $+$.} In this case, the heights of left and right subtrees are equal now, and node is marked with $0$, and propagation can be stopped.\label{avl:rules:insert:1}
\item \textit{Node was marked with $0$.} In this case, the node is marked with $-$, but the tree's height rooted at the node has changed, so we need to propagate the changes further.\label{avl:rules:insert:2}
\item \textit{Node was marked with $-$.} In this case, the node would acquire balance factor of $-2$, which is not allowed. In this situation, we decide based on the mark of the node from which we are propagating the insertion in the following way (let $x$ be the node from which the information is being propagated and $z$ the current node marked with $-$):\label{avl:rules:insert:3}
\begin{enumerate}
\item $x$ is marked with $-$, then we rotate by $z$ to the right. After that, both $z$ and $x$ can be marked with $0$. The height from the parent's point has not changed; therefore, we can stop the propagation.\label{avl:rules:insert:3a}
\item $x$ is marked with $+$, then we double rotate by $x$ to the left and by $z$ to the right. Here we need to recalculate the balance factors for $z$ and $x$, where $z$ gets $-$ or $0$ and $x$ gets $0$ or $+$. The node that was a right child to the $x$ before the double-rotation is now marked with $0$ and propagation can be stopped.\label{avl:rules:insert:3b}
\item $x$ is marked with $0$. This case is trivial since it cannot happen because we never propagate the height change from a node that acquired sign $0$.
\end{enumerate}
\end{enumerate}
In the following explanation, we have to consider that valid nodes in the AVL tree implemented via ranks are (1,~1) and (1,~2). By evaluating rank-differences of the parent, they are already affected by the rebalancing done from the inserted leaf.
Rebalancing the tree is equivalent to the rebalancing of the AVL tree and is executed in the following way that can be seen in \autoref{algorithm:wavl:insertRebalance}.
\begin{algorithm}
\Proc{$\texttt{insertRebalance}(T, node)$}{
\tcp{Handles \hyperref[avl:rules:insert:2]{rule 2}}
\While{$node.parent \neq nil \land node.parent \text{ is (0,1)-node}$}{
$\texttt{promote}(node.parent)$\;
$node \gets node.parent$\;
}
\BlankLine
\tcp{Handles \hyperref[avl:rules:insert:1]{rule 1}}
\lIf{$node.parent = nil \lor node\, is\, not\, \text{0-child}$}{\Return}
\BlankLine
\tcp{Handles \hyperref[avl:rules:insert:3]{rule 3}}
\eIf{$node = node.parent.left$}{
$\wavlFixZeroChild(T, node, node.right, \texttt{rotateLeft}, \texttt{rotateRight})$\;
}{
$\wavlFixZeroChild(T, node, node.left, \texttt{rotateRight}, \texttt{rotateLeft})$\;
}
\BlankLine
}
\caption{Algorithm containing bottom-up rebalancing after insertion.}\label{algorithm:wavl:insertRebalance}
\end{algorithm}
As a first step, which can be seen in \autoref{algorithm:wavl:insertRebalance}, we iteratively check the rank-differences of a parent of the current node. As long as it is a (0,~1)-node, we promote it and propagate the error further. There is an interesting observation to be made about the way \textit{how the parent can fulfil such requirements}. Moreover, the answer is simple, since we are adding a leaf or are already propagating the change to the root, we have lowered the rank-difference of the parent; therefore, it must have been (1,~1)-node. From the algorithm used for usual implementations of AVL trees, this step refers to \hyperref[avl:rules:insert:2]{\textit{rule 2}}. After the promotion, the parent becomes (1,~2)-node. That means it gets sign a $-$ (or $+$ respectively when propagating from the right subtree), which conforms to the standard algorithm.
After this, we might end up in two situations, and those are:
\begin{enumerate}
\item The current node is not a 0-child, which means that after propagation and promotions we have gotten to a parent node that is (1,~2), which refers to \hyperref[avl:rules:insert:1]{\textit{rule 1}}.
\item The current node is a 0-child, which means that after propagation and promotions, we have a node with a parent that is (0,~2)-node. This case conforms to \hyperref[avl:rules:insert:3]{\textit{rule 3}} and must be handled further to fix the broken rank rule.
\end{enumerate}
\hyperref[avl:rules:insert:3]{\textit{Rule 3}} is then handled by a helper function that can be seen in \autoref{algorithm:wavl:fix0Child}.
\begin{algorithm}
\Proc{$\texttt{fix0Child}(T, x, y, rotateToLeft, rotateToRight)$}{
$z \gets x.parent$\;
\BlankLine
\uIf(\tcp*[h]{Handles \hyperref[avl:rules:insert:3a]{rule 3a}}){$y = nil \lor y\, is\, \text{2-child}$}{
$rotateToRight(T, z)$\;
\BlankLine
$\texttt{demote}(z)$\;
}
\ElseIf(\tcp*[h]{Handles \hyperref[avl:rules:insert:3b]{rule 3b}}){$y\, is\, \text{1-child}$}{
$rotateToLeft(T, x)$\;
$rotateToRight(T, z)$\;
\BlankLine
$\texttt{promote}(y)$\;
$\texttt{demote}(x)$\;
$\texttt{demote}(z)$\;
}
}
\caption{Generic algorithm for fixing 0-child after insertion.}\label{algorithm:wavl:fix0Child}
\end{algorithm}
Here we can see, once again, an interesting pattern. When comparing to the algorithm described above, using the rank representation, we do not need to worry about changing the signs and updating the heights, since by rotating combined with demotion and promotion of the ranks, we are effectively updating the height (represented via rank) of the affected nodes. This observation could be used in \autoref{algorithm:avl:deleteFixNode} and \autoref{algorithm:avl:deleteRotate}, where we turned to manual updating of ranks to show the difference.
\section{Deletion from the weak AVL tree}
\begin{algorithm}
\Proc{$\texttt{deleteRebalance}(T, y, parent)$}{
\uIf{$y \text{ is (2, 2)-node}$}{
$\texttt{demote}(y)$\;
$parent \gets y.parent$\;
}
\ElseIf{$parent \text{ is (2, 2)-node}$}{
$\texttt{demote}(parent)$\;
$parent \gets parent.parent$\;
}
\BlankLine
\If{$parent = nil$}{
\Return\;
}
$z \gets \text{3-child of } parent$\;
\If{$z \neq nil$}{
$\wavlBottomUpDelete(T, z, parent)$\;
}
}
\caption{Initial phase of algorithm for the rebalance after deletion from the WAVL tree.}\label{algorithm:wavl:deleteRebalance}
\end{algorithm}
As described by \textit{Haeupler et al.}~\cite{wavl}, we start the deletion rebalancing by checking for (2,~2)-node. If that is the case, we demote it and continue with the deletion rebalancing via \autoref{algorithm:wavl:bottomUpDelete} if we have created a 3-child by the demotion. Demoting the (2,~2)-node is imperative since it enforces part of the \textit{Weak AVL Rule}, requiring that leaves have a rank equal to zero.
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=1+ rank=1}) at (28.597bp,105.0bp) [draw,ellipse] {1, 1};
\node (Node{value=2+ rank=0}) at (28.597bp,18.0bp) [draw,ellipse] {2, 0};
\draw [->] (Node{value=1+ rank=1}) ..controls (28.597bp,75.163bp) and (28.597bp,59.548bp) .. (Node{value=2+ rank=0});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (33.597bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{WAVL tree containing two elements.}
\label{fig:wavl:twoElements}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=1+ rank=1}) at (28.597bp,105.0bp) [draw,ellipse] {1, 0};
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
%
\end{tikzpicture}
\caption{\autoref{fig:wavl:twoElements} after deletion of 2.}
\label{fig:wavl:twoElementsAfterDelete}
\end{figure}
For example, consider the following tree in \autoref{fig:wavl:twoElements}. Deletion of key 2 from that tree would result in having only key 1 in the tree with a rank equal to 1, which would be (2,~2)-node and leaf at the same time. After the demotion of the remaining key, we acquire the tree as shown in \autoref{fig:wavl:twoElementsAfterDelete}
In contrast to the \textit{AVL Rule}, the WAVL tree allows us to have (2,~2) nodes present. Therefore we can encounter two key differences during deletion rebalancing:
\begin{enumerate}
\item If anywhere during the deletion rebalancing, \textbf{but not} at the start, we encounter (2,~2)-node, we can safely stop the rebalancing process since the rest of the tree must be correct, and we have fixed errors on the way to the current node from the leaf.
\item Compared to the AVL tree, during deletion rebalancing, we need to fix \textbf{3-child} nodes.
\end{enumerate}
\begin{algorithm}
\Proc{$\texttt{bottomUpDelete}(T, x, p)$}{
\If{$x \text{ is not 3-child} \lor p = nil$}{
\Return\;
}
\BlankLine
$y \gets sibling(x)$\;
\BlankLine
\While{$p \neq nil \land x \text{ is 3-child} \land (y \text{ is 2-child or (2,~2)-node})$}{
\If{$y \text{ is not 2-child}$}{
$\texttt{demote}(y)$\;
}
$\texttt{demote}(p)$\;
\BlankLine
$x \gets parent$\;
$p \gets x.parent$\;
\If{$p = nil$}{
\Return;
}
\BlankLine
$y \gets sibling(x)$\;
}
\BlankLine
\If{$p \text{ is not (1,~3)-node}$}{
\Return\;
}
\eIf{$p.left = x$}{
$\wavlFixDelete(T, x, p.right, p, false, \texttt{rotateLeft}, \texttt{rotateRight})$\;
}{
$\wavlFixDelete(T, x, p.left, p, true, \texttt{rotateRight}, \texttt{rotateLeft})$\;
}
}
\caption{Propagation of the broken rank rule after deletion from the WAVL tree.}\label{algorithm:wavl:bottomUpDelete}
\end{algorithm}
In all cases of deletion propagation in the WAVL tree, we consider the following setup of nodes that can be seen in \autoref{fig:wavl:deletionPropagation}, where $x$ denotes the node we are propagating the deletion from and $y$ is its sibling.
\begin{figure}
\centering
\begin{tikzpicture}
\node [treenode] {$p$}
child{node[treenode] {$x$}}
child{node[treenode] {$y$}};
\end{tikzpicture}
\caption{Nodes assigned to their respective names during the propagation.}
\label{fig:wavl:deletionPropagation}
\end{figure}
The propagation happens by demoting the parent and possibly sibling of $x$. In the condition of the $while$-loop, we demand that $y$ is either a 2-child or a (2,~2)-node. If the condition of the $while$-loop is fulfilled, then we can encounter the following situations:
\begin{enumerate}
\item \textit{$x$ is a 3-child and $y$ is a 2-child}: in this case, their parent is a (2,~3)-node, and by demoting the parent, we obtain a (1,~2)-node and also propagate the error further to the root. This situation can be seen in \autoref{fig:wavl:deletion:propagationA}.
\item \textit{$x$ is a 3-child and $y$ is a (2,~2)-node}: in this case, their parent is a (1,~3)-node, and by demoting both the parent and $y$, we obtain (1,~2)-node, $y$ becomes (1,~1)-node, and also we propagate the error further to the root. This situation can be seen in \autoref{fig:wavl:deletion:propagationB}.
\item \textit{$x$ is a 3-child, and $y$ is \textbf{both} a 2-child and a (2,~2)-node}: in this case, their parent is a (2,~3)-node, and we demote just the parent, that way, we obtain a (1,~2)-node and also propagate the error further to the root. In this case, the $y$ node stays a (2,~2)-node.
\end{enumerate}
\begin{figure}
\centering
\begin{tabular}{ ccc }
\begin{tikzpicture}
\node [treenode] {$p$}
child{ node[treenode] {$x$} edge from parent node[left side node] {$3$} }
child{ node[treenode] {$y$} edge from parent node[right side node] {$2$} };
\end{tikzpicture}
&
becomes
&
\begin{tikzpicture}
\node [treenode] {$p$}
child{ node[treenode] {$x$} edge from parent node[left side node] {$2$} }
child{ node[treenode] {$y$} edge from parent node[right side node] {$1$} };
\end{tikzpicture}
\end{tabular}
\caption{Deletion propagation in the WAVL tree where $x$ is a 3-child and $y$ is a 2-child.}
\label{fig:wavl:deletion:propagationA}
\end{figure}
\begin{figure}
\centering
\begin{tabular}{ ccc }
\begin{tikzpicture}
\node [treenode] {$p$}
child{ node[treenode] {$x$} edge from parent node[left side node] {$3$} }
child{ node[treenode] {$y$} edge from parent node[right side node] {$1$} };
\end{tikzpicture}
&
becomes
&
\begin{tikzpicture}
\node [treenode] {$p$}
child{ node[treenode] {$x$} edge from parent node[left side node] {$2$} }
child{ node[treenode] {$y$} edge from parent node[right side node] {$1$} };
\end{tikzpicture}
\end{tabular}
\caption{Deletion propagation in the WAVL tree where $x$ is a 3-child and $y$ is a (2,~2)-node.}
\label{fig:wavl:deletion:propagationB}
\end{figure}
\begin{algorithm}
\Proc{$\texttt{fixDelete}(T, x, y, z, reversed, rotateL, rotateR)$}{
$v \gets y.left$\;
$w \gets y.right$\;
\If{$reversed$}{
$(v, w) \gets (w, v)$\;
}
\BlankLine
\uIf{$w \text{ is 1-child} \land y.parent \neq nil$}{
$rotateL(T, y.parent)$\;
\BlankLine
$\texttt{promote}(y)$\;
$\texttt{demote}(z)$\;
\BlankLine
\If{$z$ is a leaf}{
$\texttt{demote}(z)$\;
}
}
\ElseIf{$w \text{ is 2-child} \land v.parent \neq nil$}{
$rotateR(T, v.parent)$\;
$rotateL(T, v.parent)$\;
\BlankLine
$\texttt{promote}(v)$\;
$\texttt{promote}(v)$\;
$\texttt{demote}(y)$\;
$\texttt{demote}(z)$\;
$\texttt{demote}(z)$\;
}
}
\caption{Final phase of the deletion rebalance after deletion from the WAVL tree.}\label{algorithm:wavl:fixDelete}
\end{algorithm}
The final part happens when it is not possible to propagate the error further. In that case, we perform either single or double rotation.
\section{Heuristics}
\textit{Haeupler et al.}~\cite{wavl} also present different heuristics for the rebalancing of the WAVL tree. In this thesis, we present algorithms for bottom-up rebalancing. Top-down rebalancing can be compared to the preemptive algorithms on the B-tree that we mentioned in \autoref{chap:sb-bst}. It can be described in the following way:
When inserting the key to the WAVL tree, we promote every (1,~1)-node that we encounter on the way down to the parent node.
When deleting the key from the WAVL tree, we analogically demote every (2,~2)-node that we encounter on the way down to the parent node.