This repository has been archived on 2022-05-18. You can view files and clone it, but cannot push or open issues or pull requests.
thesis/wavl_trees.tex
Matej Focko 9a32d10615
chore: split to multiple files
Signed-off-by: Matej Focko <mfocko@redhat.com>
2022-05-17 13:37:59 +02:00

414 lines
20 KiB
TeX

\chapter{Weak AVL Trees}
\section{Rank rule}
Based on the rank rules for implementing red-black tree (as described in \ref{chap:rb-rule}) and AVL tree (as described in \ref{chap:avl-rule}), \textit{Haeupler et al.} present a new rank rule:
\textbf{Weak AVL Rule}: All rank differences are 1 or 2, and every leaf has rank 0.~\cite{wavl}
Comparing the \textit{Weak AVL Rule} to the \textit{AVL Rule}, we can come to these conclusions:
\begin{itemize}
\item \textit{Every leaf has rank 0} holds with the AVL Rule, since every node is (1, 1) or (1, 2) and rank of a node represents height of its tree. Rank of \textit{nil} is defined as $-1$ and height of tree rooted at leaf is $0$, therefore leaves are (1, 1)-nodes
\item \textit{All rank differences are 1 or 2} does not hold in one specific case, and that is (2, 2)-node, which is allowed in the WAVL tree, but not in the AVL tree. This difference will be explained more thoroughly later on.
\end{itemize}
\section{Height boundaries}
We have described in \autoref{chap:sb-bst} other common self-balanced binary search trees to be able to draw analogies and explain differences between them. Given the boundaries of height for red-black and AVL tree, we can safely assume that the AVL is more strict with regards to the self-balancing than the red-black tree. Let us show how does WAVL fit among them. \textit{Haeupler et al.} present following bounds~\cite{wavl}:
\[ h \leq k \leq 2h \text{ and } k \leq 2 \log_2{n} \]
In those equations we can see $h$ and $n$ in the same context as we used it to lay boundaries for the AVL and red-black trees, but we can also see new variable $k$, which represents the rank of the tree.
One of the core differences between AVL and WAVL lies in the rebalancing after deletion. Insertion into the WAVL tree is realized in the same way as it would in the AVL tree and the benefit of (2, 2)-node is used during deletion rebalancing.
From the previous 2 statements we can come to 2 conclusions and those are:
\begin{itemize}
\item If we commit only insertions to the WAVL tree, it will always yield a valid AVL tree. In that case it means that the height boundaries are same as of the AVL tree (described in \autoref{avl-height}).
\item If we commit deletions too, we can assume the worst-case scenario where \[ h < 2 \log_2{n} \] which also holds for the red-black trees.
\end{itemize}
From the two conclusions we can safely deduce that the WAVL tree is in the worst-case scenario as efficient as the red-black tree and in the best-case scenario as efficient as the AVL tree.
\newpage
\section{Insertion into the weak AVL tree}
Inserting values into WAVL tree is equivalent to inserting values into regular binary-search tree followed up by rebalancing that ensures rank rules hold. This part can be clearly seen in \autoref{algorithm:wavl:insert}. We can also see there two early returns, one of them happens during insertion into the empty tree and other during insertion of duplicate key, which we do not allow.
\begin{algorithm}
\Proc{$\texttt{insert}(T, key)$}{
$insertedNode \gets Node(key)$\;
\If{$T.root = nil$}{
$T.root \gets insertedNode$\;
\Return\;
}
\BlankLine
$parent \gets \findParentNode(key, T.root)$\;
\If{$parent = nil$}{
\Return\;
}
$insertedNode.parent \gets parent$\;
\BlankLine
\eIf{$key < parent.key$}{
$parent.left \gets insertedNode$\;
}{
$parent.right \gets insertedNode$\;
}
\BlankLine
$\wavlInsertRebalance(T, insertedNode)$\;
}
\caption{Insert operation on binary search tree}\label{algorithm:wavl:insert}
\end{algorithm}
In the \autoref{algorithm:wavl:insert} we have also utilized a helper function that is used to find parent of the newly inserted node and also prevents insertion of duplicate keys within the tree. Pseudocode of that function can be seen in \autoref{algorithm:findParentNode}.
\begin{algorithm}
\Fn{$\texttt{findParentNode}(key, node)$}{
$childNode \gets node$\;
\BlankLine
\While{$childNode \neq nil$}{
$node \gets childNode$\;
\uIf{$key < node.key$}{
$childNode \gets node.left$\;
}
\ElseIf{$node.key < key$}{
$childNode \gets node.right$\;
}
\Else{
\Return{nil}\;
}
}
\BlankLine
\Return{node}\;
}
\caption{Helper function that returns parent for newly inserted node}\label{algorithm:findParentNode}
\end{algorithm}
Rebalancing after insertion in the WAVL tree is equivalent to rebalancing after insertion in the AVL tree. We will start with a short description of the rebalancing within AVL to lay a foundation for analogies and differences compared to the implementation using ranks.
When propagating the error, we can encounter 3 cases (we explain them with respect to propagating insertion from the left subtree, propagation from right is mirrored and role of trits $+$ and $-$ swaps)~\cite{labyrint}:
\begin{enumerate}
\item \textit{Node was marked with $+$.} In this case, heights of left and right subtrees are equal now and node is marked with $0$ and propagation can be stopped.\label{avl:rules:insert:1}
\item \textit{Node was marked with $0$.} In this case, node is marked with $-$, but the height of the tree rooted at the node has changes, which means that we need to propagate the changes further.\label{avl:rules:insert:2}
\item \textit{Node was marked with $-$.} In this case, node would acquire balance-factor of $-2$, which is not allowed. In this situation we decide based on the mark of the node from which we are propagating the insertion in the following way (let $x$ be the node from which the information is being propagated and $z$ the current node marked with $-$):\label{avl:rules:insert:3}
\begin{enumerate}
\item $x$ is marked with $-$, then we rotate by $z$ to the right. After that both $z$ and $x$ can be marked with $0$. Height from the point of the parent has not changed, so we can stop the propagation.\label{avl:rules:insert:3a}
\item $x$ is marked with $+$, then we double rotate: first by $x$ to the left and then by $z$ to the right. Here we need to recalculate the balance-factors for $z$ and $x$, where $z$ gets $-$ or $0$ and $x$ gets $0$ or $+$. Node that was a right child to the $x$ before the double-rotation is now marked with $0$ and propagation can be stopped.\label{avl:rules:insert:3b}
\item $x$ is marked with $0$. This case is trivial, since it cannot happen, because we never propagate the height change from a node that acquired sign $0$.
\end{enumerate}
\end{enumerate}
In the following explanation we have to consider that valid nodes in AVL tree implemented via ranks are (1, 1) and (1, 2) and by the time of evaluating rank-differences of parent, they are already affected by the rebalancing done from the inserted leaf.
Rebalancing of the tree is equivalent to rebalancing of AVL tree and is executed in a following way:
\begin{algorithm}
\Proc{$\texttt{insertRebalance}(T, node)$}{
\tcp{Handles \hyperref[avl:rules:insert:2]{rule 2}}
\While{$node.parent \neq nil \land (node.parent\, is\, (0, 1)\, or\, (1, 0))$}{
$\texttt{promote}(node.parent)$\;
$node \gets node.parent$\;
}
\BlankLine
\tcp{Handles \hyperref[avl:rules:insert:1]{rule 1}}
\lIf{$node.parent = nil \lor node\, is\, not\, \text{0-child}$}{\Return}
\BlankLine
\tcp{Handles \hyperref[avl:rules:insert:3]{rule 3}}
\eIf{$node = node.parent.left$}{
$\wavlFixZeroChild(T, node, node.right, \texttt{rotateLeft}, \texttt{rotateRight})$\;
}{
$\wavlFixZeroChild(T, node, node.left, \texttt{rotateRight}, \texttt{rotateLeft})$\;
}
\BlankLine
}
\caption{Algorithm containing bottom-up rebalancing after insertion}\label{algorithm:wavl:insertRebalance}
\end{algorithm}
As a first step, which can be seen in \autoref{algorithm:wavl:insertRebalance}, we iteratively check rank-differences of a parent of the current node. As long as it is a (0, 1) or (1, 0) node, we promote it and propagate further. There is an interesting observation to be made about the way \textit{how parent can fulfill such requirement}. And the answer is simple, since we are adding a leaf or are already propagating the change to the root, it means that we have lowered the rank-difference of the parent, therefore it must have been (1, 1) node. From the algorithm used for usual implementations of AVL trees, this step refers to \hyperref[avl:rules:insert:2]{\textit{rule 2}}. After the promotion the rank of the parent becomes (1, 2) or (2, 1) which means that it gets sign $-$ (or $+$ respectively when propagating from the right subtree), which conforms to the usual algorithm.
After this, we might end up in two situations and those are:
\begin{enumerate}
\item Current node is not a 0-child, which means that after propagation and promotions we have gotten to a parent node that is (1, 2) or (2, 1), which refers to the \hyperref[avl:rules:insert:1]{\textit{rule 1}}.
\item Current node is a 0-child, which means that after propagation and promotions we have a node with a parent that is either (0, 2) or (2, 0) node. This case conforms to the \hyperref[avl:rules:insert:3]{\textit{rule 3}} and must be handled further to fix the broken rank rule.
\end{enumerate}
\hyperref[avl:rules:insert:3]{\textit{Rule 3}} is then handled by a helper function that can be seen in \autoref{algorithm:wavl:fix0Child}.
\begin{algorithm}
\Proc{$\texttt{fix0Child}(T, x, y, rotateToLeft, rotateToRight)$}{
$z \gets x.parent$\;
\BlankLine
\uIf(\tcp*[h]{Handles \hyperref[avl:rules:insert:3a]{rule 3a}}){$y = nil \lor y\, is\, \text{2-child}$}{
$rotateToRight(T, z)$\;
\BlankLine
$\texttt{demote}(z)$\;
}
\ElseIf(\tcp*[h]{Handles \hyperref[avl:rules:insert:3b]{rule 3b}}){$y\, is\, \text{1-child}$}{
$rotateToLeft(T, x)$\;
$rotateToRight(T, z)$\;
\BlankLine
$\texttt{promote}(y)$\;
$\texttt{demote}(x)$\;
$\texttt{demote}(z)$\;
}
}
\caption{Generic algorithm for fixing 0-child after insertion}\label{algorithm:wavl:fix0Child}
\end{algorithm}
Here we can see, once again, an interesting pattern. When comparing to the algorithm described above, using the rank representation, we do not need to worry about changing the signs and updating the heights, since by rotating combined with demotion and promotion of the ranks, we are effectively updating the height (represented via rank) of the affected nodes. This observation could be used in \autoref{algorithm:avl:deleteFixNode} and \autoref{algorithm:avl:deleteRotate} where we turned to manual updating of ranks to show the difference.
\section{Deletion from the weak AVL tree}
\begin{algorithm}
\Proc{$\texttt{deleteRebalance}(T, y, parent)$}{
\uIf{$y \text{ is (2, 2)}$}{
$\texttt{demote}(y)$\;
$parent \gets y.parent$\;
}
\ElseIf{$parent \text{ is (2, 2)}$}{
$\texttt{demote}(parent)$\;
$parent \gets parent.parent$\;
}
\BlankLine
\If{$parent = nil$}{
\Return\;
}
$z \gets \text{3-child of } parent$\;
\If{$z \neq nil$}{
$\wavlBottomUpDelete(T, z, parent)$\;
}
}
\caption{Initial phase of algorithm for the rebalance after deletion from the WAVL tree}\label{algorithm:wavl:deleteRebalance}
\end{algorithm}
As described by \textit{Haeupler et al.}, we start the deletion rebalancing by checking for (2, 2) node. If that is the case, we demote it and continue with the deletion rebalancing via \autoref{algorithm:wavl:bottomUpDelete} if we have created a 3-child by the demotion. Demoting the (2, 2) node is imperative, since it enforces part of the \textit{Weak AVL Rule} requiring that leaves have rank equal to zero.
For example consider the following tree in \autoref{fig:wavl:twoElements}. Deletion of key 2 from that tree would result in having only key 1 in the tree with rank equal to 1, which would be (2, 2) node and leaf at the same time. After the demotion of the remaining key, we acquire the tree as shown in \autoref{fig:wavl:twoElementsAfterDelete}
In contrast to the \textit{AVL Rule}, WAVL tree allows us to have (2, 2) nodes present. Therefore we can encounter two key differences during deletion rebalancing:
\begin{enumerate}
\item If anywhere during the deletion rebalancing, \textbf{but not} at the start, we encounter (2, 2) node, we can safely stop the rebalancing process, since rest of the tree must be correct and we have fixed errors on the way to the current node from the leaf.
\item Compared to the AVL tree, during deletion rebalancing we need to fix \textbf{3-child} nodes.
\end{enumerate}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=1+ rank=1}) at (28.597bp,105.0bp) [draw,ellipse] {1, 1};
\node (Node{value=2+ rank=0}) at (28.597bp,18.0bp) [draw,ellipse] {2, 0};
\draw [->] (Node{value=1+ rank=1}) ..controls (28.597bp,75.163bp) and (28.597bp,59.548bp) .. (Node{value=2+ rank=0});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (33.597bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{WAVL tree containing two elements}
\label{fig:wavl:twoElements}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{value=1+ rank=1}) at (28.597bp,105.0bp) [draw,ellipse] {1, 0};
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
%
\end{tikzpicture}
\caption{\autoref{fig:wavl:twoElements} after deletion of 2}
\label{fig:wavl:twoElementsAfterDelete}
\end{figure}
\begin{algorithm}
\Proc{$\texttt{bottomUpDelete}(T, x, parent)$}{
\If{$x \text{ is not 3-child} \lor parent = nil$}{
\Return\;
}
\BlankLine
$y \gets nil$\;
\eIf{$parent.left = x$}{
$y \gets parent.right$\;
}{
$y \gets parent.left$\;
}
\BlankLine
\While{$parent \neq nil \land x \text{ is 3-child} \land (y \text{ is 2-child or (2, 2)})$}{
\If{$y \text{ is not 2-child}$}{
$\texttt{demote}(y)$\;
}
$\texttt{demote}(parent)$\;
\BlankLine
$x \gets parent$\;
$parent \gets x.parent$\;
\If{$parent = nil$}{
\Return;
}
\BlankLine
\eIf{$parent.left = x$}{
$y \gets parent.right$\;
}{
$y \gets parent.left$\;
}
}
\BlankLine
\If{$parent \text{ is not (1, 3)}$}{
\Return\;
}
$p \gets parent$\;
\eIf{$parent.left = x$}{
$\wavlFixDelete(T, x, p.right, p, false, \texttt{rotateLeft}, \texttt{rotateRight})$\;
}{
$\wavlFixDelete(T, x, p.left, p, true, \texttt{rotateRight}, \texttt{rotateLeft})$\;
}
}
\caption{Propagation of the broken rank rule after deletion from the WAVL tree}\label{algorithm:wavl:bottomUpDelete}
\end{algorithm}
\begin{algorithm}
\Proc{$\texttt{fixDelete}(T, x, y, z, reversed, rotateL, rotateR)$}{
$v \gets y.left$\;
$w \gets y.right$\;
\If{$reversed$}{
$(v, w) \gets (w, v)$\;
}
\BlankLine
\uIf{$w \text{ is 1-child} \land y.parent \neq nil$}{
$rotateL(T, y.parent)$\;
\BlankLine
$\texttt{promote}(y)$\;
$\texttt{demote}(z)$\;
\BlankLine
\If{$z$ is a leaf}{
$\texttt{demote}(z)$\;
}
}
\ElseIf{$w \text{ is 2-child} \land v.parent \neq nil$}{
$rotateR(T, v.parent)$\;
$rotateL(T, v.parent)$\;
\BlankLine
$\texttt{promote}(v)$\;
$\texttt{promote}(v)$\;
$\texttt{demote}(y)$\;
$\texttt{demote}(z)$\;
$\texttt{demote}(z)$\;
}
}
\caption{Final phase of the deletion rebalance after deletion from the WAVL tree}\label{algorithm:wavl:fixDelete}
\end{algorithm}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{1}) at (87.197bp,192.0bp) [draw,ellipse] {1, 2};
\node (Node{0}) at (31.197bp,105.0bp) [draw,ellipse] {0, 1};
\node (Node{2}) at (106.2bp,105.0bp) [draw,ellipse] {2, 0};
\node (Node{-1}) at (31.197bp,18.0bp) [draw,ellipse] {-1, 0};
\draw [->] (Node{1}) ..controls (68.373bp,162.43bp) and (56.68bp,144.68bp) .. (Node{0});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (68.197bp,148.5bp) node {1};
\draw [->] (Node{1}) ..controls (93.66bp,162.09bp) and (97.18bp,146.34bp) .. (Node{2});
\draw (102.2bp,148.5bp) node {2};
\draw [->] (Node{0}) ..controls (31.197bp,75.163bp) and (31.197bp,59.548bp) .. (Node{-1});
\draw (36.197bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{WAVL tree with elements inserted in order $(0, 1, 2, -1)$}\label{fig:wavl:deletionA:before}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{1}) at (31.197bp,192.0bp) [draw,ellipse] {1, 2};
\node (Node{0}) at (31.197bp,105.0bp) [draw,ellipse] {0, 1};
\node (Node{-1}) at (31.197bp,18.0bp) [draw,ellipse] {-1, 0};
\draw [->] (Node{1}) ..controls (31.197bp,162.16bp) and (31.197bp,146.55bp) .. (Node{0});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (36.197bp,148.5bp) node {1};
\draw [->] (Node{0}) ..controls (31.197bp,75.163bp) and (31.197bp,59.548bp) .. (Node{-1});
\draw (36.197bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{WAVL tree from \autoref{fig:wavl:deletionA:before} after deletion of 2, value is replaced by one of its children}\label{fig:wavl:deletionA:replacing}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{0}) at (70.197bp,105.0bp) [draw,ellipse] {0, 1};
\node (Node{-1}) at (31.197bp,18.0bp) [draw,ellipse] {-1, 0};
\node (Node{1}) at (109.2bp,18.0bp) [draw,ellipse] {1, 2};
\draw [->] (Node{0}) ..controls (57.102bp,75.46bp) and (49.394bp,58.66bp) .. (Node{-1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (58.197bp,61.5bp) node {1};
\draw [->] (Node{0}) ..controls (83.292bp,75.46bp) and (91.0bp,58.66bp) .. (Node{1});
\draw (98.197bp,61.5bp) node {-1};
%
\end{tikzpicture}
\caption{rotation by parent}\label{fig:my_label}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{0}) at (70.197bp,105.0bp) [draw=blue,ellipse] {0, 2};
\node (Node{-1}) at (31.197bp,18.0bp) [draw,ellipse] {-1, 0};
\node (Node{1}) at (109.2bp,18.0bp) [draw,ellipse] {1, 2};
\draw [->] (Node{0}) ..controls (57.102bp,75.46bp) and (49.394bp,58.66bp) .. (Node{-1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (58.197bp,61.5bp) node {2};
\draw [->] (Node{0}) ..controls (83.292bp,75.46bp) and (91.0bp,58.66bp) .. (Node{1});
\draw (96.197bp,61.5bp) node {0};
%
\end{tikzpicture}
\caption{promotion of y}\label{fig:my_label}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{0}) at (70.197bp,105.0bp) [draw,ellipse] {0, 2};
\node (Node{-1}) at (31.197bp,18.0bp) [draw,ellipse] {-1, 0};
\node (Node{1}) at (109.2bp,18.0bp) [draw=blue,ellipse] {1, 1};
\draw [->] (Node{0}) ..controls (57.102bp,75.46bp) and (49.394bp,58.66bp) .. (Node{-1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (58.197bp,61.5bp) node {2};
\draw [->] (Node{0}) ..controls (83.292bp,75.46bp) and (91.0bp,58.66bp) .. (Node{1});
\draw (96.197bp,61.5bp) node {1};
%
\end{tikzpicture}
\caption{demotion of z}\label{fig:my_label}
\end{figure}
\begin{figure}
\centering
\begin{tikzpicture}[>=latex',line join=bevel,scale=0.75,]
%%
\node (Node{0}) at (70.197bp,105.0bp) [draw,ellipse] {0, 2};
\node (Node{-1}) at (31.197bp,18.0bp) [draw,ellipse] {-1, 0};
\node (Node{1}) at (109.2bp,18.0bp) [draw=blue,ellipse] {1, 0};
\draw [->] (Node{0}) ..controls (57.102bp,75.46bp) and (49.394bp,58.66bp) .. (Node{-1});
\definecolor{strokecol}{rgb}{0.0,0.0,0.0};
\pgfsetstrokecolor{strokecol}
\draw (58.197bp,61.5bp) node {2};
\draw [->] (Node{0}) ..controls (83.292bp,75.46bp) and (91.0bp,58.66bp) .. (Node{1});
\draw (96.197bp,61.5bp) node {2};
%
\end{tikzpicture}
\caption{second demotion of z}\label{fig:my_label}
\end{figure}