7ae769f950ab41947cf13f9ebae994c723253199 — Martin Hafskjold Thoresen 5 years ago
Write the first part of Integer strucutres. Fusion trees are missing

1 files changed, 124 insertions(+), 1 deletions(-)

M book.tex

M book.tex => book.tex +124 -1
@@ 1269,11 1269,134 @@ which was the claim we needed to prove.

\chapter{Integer Structures}
+\topics{van Emde Boas, x-fast trees, y-fast trees, fusion trees.}

-van Emde Boas, x-fast trees, y-fast trees, fusion trees.
+In this chapter we look at data structures operating on integers in the word machine.
+By limiting ourselves to only act on integers we can achieve time bounds as a function of the word size.
+
+The general problem we want to solve is \algo{Predecessor}: we maintain a structure containing $n$ words,
+and would like to support insertion, deletion, and predecessor and/or successor.
+The universe $U$ we get the integers from is asusmed  to be a power of two $2^w$; we call $w$ the word size.
+We also assume that $n \leq \log w$, that is, that the number of elements in the structure is representable in a word.
+
+In a comparison based model, the precesessor problem has a lower bound of $\Theta(\log n)$.
+Table~\ref{tab:integer-structures} summarizes the structures we will look at, with time and space complexities.
+Note that vEB with hashing or y-fast trees are faster than fusion trees if $w$ is small.
+
+\begin{table}[ht]
+    \centering
+    \begin{tabular}{c c c}
+        Name & Operation & Space\\\midrule
+        van Emde Boas Trees & $O(\log w)$ & $\Theta(U)$\\
+        vEB + Hashing & $O(\log w)$ whp. & $\Theta(n)$\\
+        y-fast trees & $O(\log w)$ whp. & $\Theta(n)$\\
+        Fusion trees & $O(\log_w n)$ whp. & $\Theta(n)$\\
+    \end{tabular}
+    \caption{%
+\label{tab:integer-structures}
+    The integer structures we look at in this chapter.
+    }
+\end{table}

\section{van Emde Boas Tree}%
\label{sec:vEB}
+The general idea of the van Emde Boas Tree is to try to obtain the time recurrence $T(n) = T(\sqrt(n)) + O(1)$.
+We can do this by splitting the universe into $\sqrt{u}$ clusters, each of size $\sqrt{u}$, and recurse on the cluster.
+For universes that are a power of 2, this means splitting the bits of the number if half.
+Consider a word $a = \big<c, i\big>$;
+we call the most significant bits $c$ and the least significant bits $i$.
+This is simply to do in the word machine: we simply mask out the lower bits to get $i$, and we shift down and mask to get $c$.
+
+The van Emde Boas tree is a recursive structure.
+At each level we have four things:
+$\sqrt{u}$ \emph{Clusters}, a vEB trees of size $\sqrt{u}$ which signals which of the clusters are empty;
+one \emph{Summary}, which is also a vEB tree of size $\sqrt{u}$;
+the minimum element in the tree; and the maximum element in the tree.
+The minimum element is not stored elsewhere in the tree, but the maximum element is.
+
+\subsection{Operations}
+With this layout, we can see how we will do the recursion:
+we index the cluster using $c$, the most significant bits, and then continue by using $i$ in the next query.
+We will call the vEB tree $v$, such that $v.cluster$ is the first cluster in the current tree.
+We assume no key duplication.
+
+\subsubsection{Successor}
+If the queried element is smaller than $v.cluster[c].max$, we find the successor for $i$ in $v.cluster[c]$.
+When returning we also have to rebuild'' our number, by appending $c$ in front.
+If the element is larger than $v.max$, we have to find the next non-empty cluster, and select the minimum element in it, since
+this will be our successor. Note that we know one such element will exist, since $x < v.max$.
+To find this we do a sucessor query of $c$ in $v.summary$.
+When rebuilding we now have to use $c'$, the successor of $c$ in the summary, instead of $c$.
+
+In total, we get one recursive call in either case.
+
+\subsubsection{Insert}
+At first this might seem simple. We find the correct cluster, and recurse into it. However, we also need to update the summary, if
+the cluter was empty.
+This makes for \emph{two} recursive calls, which we cannot afford, since $T(u) = 2T(\sqrt{u}) + O(1)$ solves to $O(\log u)$, rather than $O(\log \log u)$ which is what we want ($w = \log u$).
+However, we observe that if the cluster is empty, the insert call to it will be trivial since we only set $v.min$ which is not stored elsewhere in the tree.
+
+\subsubsection{Delete}
+Deleting an element is slightly more complex.
+First we check if $x = v.min$.
+If the tree only contains one element we remove $min$ and $max$, and return;
+else we take the min from the first cluster, which we find by taking the min in the summary,
+and set \emph{our} min to be this min.
+Since min keys are only stored in one place, we continue with $x = v.min$;
+that is, by overwriting $v.min$ we deleted the original queried key, but
+we have to clean up by deleing the key we just copied.
+
+Now we are either still looking for the original $x$, or we have changed $x$.
+In any case, we recurse on cluster $c$, and delete $i$.
+After deleting, we have to check if the cluster we recursed on got its $min$ removed,
+because we then have to mark it in the summary.
+Like with insert, we risk having two recursive calls here, but again one of these calls will be trivial;
+this time it is the first call that is trivial, since if we removed the last element in the tree we just
+set $v.max = v.min = None$, and did no further recursing.
+
+
+\section{Saving Space}
+The strucutre presented in Section~\ref{sec:vEB} takes $O(U)$ space, which is a lot.
+We look at ways to reduce this.
+
+The obvious way to shave off some space is to not store the clusters in an array, but in a hash table.
+This way we only pay for the tables we actually use.
+Somehow, this gives us a space bound of $O(n \log w)$, which can be improved to $O(n)$ using indirection.
+
+\subsection{Tree View}
+We can look at the van Emde Boas tree in a different view.
+Consider a balanced binary tree of height $w$, where the leaf nodes are element markers (that is, the path from the root to a leaf is the number,
+and the leaf is either 0 or 1, if the element is not or is in the tree).
+The internal nodes are the \texttt{OR} of their children.
+The upper half (in terms of height) of the tree can be considered at the Summary, and each subtree as one cluster.
+
+In this structure, updates are $O(w)$. Queries can be done faster by noticing that a path from a leaf to the root is monotone:
+it is a string of 0s followed by 1s (unless the tree is empty),
+so we can binary search it to find the transition point.
+Then, we can look at the other child of the node that has the first $1$ in the path,
+and get the min or max, depending on if it is to the left or right.
+If all subtrees store this, this is constant.
+What if we are looking for successor, but get predecessor from this method?
+If all 1 leaves store pointers to the next and previous 1 leaves, this can also be done in constant time.
+
+\subsection{X-fast Trees}
+We take inspiration from the Tree view in the previous section to build a X-fast tree,
+where we store all of the 1s in the tree, as binary strings, in a hash table.
+This lets us perform the binary search.
+Updates are still $O(\log w)$, since we have to search through the path, which is $w$ long,
+queries can use the same binary search trick, to get a time of $\Theta(\log w)$, and
+space is $O(nw)$.
+
+\subsection{Y-fast Trees}
+We use indirection on the X-fast tree to get faster updates and smaller space.
+Split the tree into two structures, an upper and lower structure.
+The upper structure consists of $O(n/w)$ of the tree nodes, and the bottom ones are BBSTs of size $O(\log w)$.
+Queries are $O(\log w)$ in both structures, but we make updates $O(\log w)$ amortized, since
+for each $w$ update in the bottom stucture we only need one update in the top structure.
+Now the space is only $O(n/w w + n) = O(n)$. Linear space!
+
+\section{Fusion Trees}
+\todo{write this :)}

\chapter{Succinct Structures}
Rank, Select