931388cf620d1ab347a3d7ffdd1063b35f8e6dc0 — Martin Hafskjold Thoresen 5 years ago 7ae769f
Finish Fusion trees
1 files changed, 153 insertions(+), 1 deletions(-)

M book.tex
M book.tex => book.tex +153 -1
@@ 1396,7 1396,159 @@ for each $w$ update in the bottom stucture we only need one update in the top st
Now the space is only $O(n/w w + n) = O(n)$. Linear space!

\section{Fusion Trees}
\todo{write this :)}
Fusion trees are more a hypotethical structure: is is fast when the word size is very large.
We want a strcuture with $O(\log_w n)$ operations.
Thus, using fusion trees or vEB trees we can get a time bound of $\min\{{\log_w n, \log w}\} \leq \sqrt{\log n}$.
We look at the static version of fusion trees; a dynamic version is possible with $O(\log_w n + \log\log n)$ time operations.

The general idea of fusion trees is to have a B-tree with a branching factor of $w^{1/5}$.
This will make the height of the tree $h = \Theta(\log_w n)$.
However, simply searching through the keys will be to slow --- we need $O(1)$ time in each node
(binary search in each node would be $\Theta(\log w \log_w n )$).
Let $k = w^{1/5}$ be the number of keys stored in each node, and let $x_0 < x_1 < \cdots < x_k$ be the keys in the node.
We would like to pack the bits in $O(w)$ space, and have $O(1)$ predecessor/successor queries on $x_i$.
Note that the queried key $x'$ does not have to be one of the stored keys $x_i$.
We will allow polynomial preprocessing.

We look at how to approach this problem.
The general idea follows three steps: key storage, node search, and key retrieval.
We find a way to store the keys in the given space, then a way to find the predecessor of the queired key, and then
a way to get back the actual key stored in the node --- since we cannot simply store all keys, we have stored a different representation.

\subsubsection{Distinguishing the Keys}
We make a mental trie of the bits of all keys in the node, and note that certain levels are more ``interresting'',
namely the ones with branching nodes, of which there are $k-1$.
Let $b_i$ be important bit $i$, and let $sketch(x)$ be all interesting bits from $x$.
We claim that $sketch(x_i) < sketch(x_{i + 1})$.
To see why, we consider the first different bit in the numbers. Since $x_i < x_{i + 1}$ this bit have to be
0 for $x_i$ and 1 for $x_{i+1}$. Since this is a branch, this bit is a part of the sketch, so $sketch(x_i)$ is the smaller.

Since each sketch is $k$ bits long, and there are $k$ of them, we can fit all sketches into a single word ($k^2 = {(w^{1/5})}^2 = w^{2/5}$).
This packing is computable in $O(1)$ time.
We would like to somehow search for our queried element in this packing.

\subsubsection{Node Search}
For a query $q$ we would like to compare it ``in parallel'' to all $x_i$s, in $O(1)$ time.
While this might seem obviously impossible, we remind ourselves that the size of all $x_i$s in our representation
is $O(w)$, so all standard operaions are done in constant time for the entire set.

So we find out that $sketch(x_i) \leq sketch(q) < sketch(x_{i + 1})$; now what?
Simply retrieving $x_i$ somehow might not be the correct answer:
if $q$ branches on some level that is not considered important by $x_i$, we have effectively ignored a bit of $q$ in its $sketch$.
Consider $\mathbf{x} = [\texttt{\underline{0}0\underline{0}0}, \texttt{\underline{0}0\underline{1}0}, \texttt{\underline{1}1\underline{0}0}, \texttt{\underline{1}1\underline{1}1}]$,
where the important bits are underlined. If we query for $q=\texttt{0101}$, we get $sketch(q) = \texttt{00}$, so $sketch(x_0) = sketch(q)$ even though $x_1 < q$!
Even worse, this is not a off-by-one error; they can be very far from each other.

However, all hope is not lost. We can look at the \LCA{} of $x_i$ or $q$ and the \LCA{} of $x_{i+1}$ and $q$ (choose the longer):
this is the nodes where we first fell off the tree.
In our example above, we went right when both $x_0$ and $x_1$ went left.
Then we know that the subtree we are in after the \LCA{} node contains no keys.
If there had been a key there, we would have gotten this key as either $x_i$ or $x_{i+1}$,
depending on the other important bits of the key.
So now, if we are looking for the predecessor, we can simply find the max key in the left subtree of the \LCA{} node.
We can do this by searching for $e=y0111\dots1$ in the left tree. In contrary to the $q$ case, this will work, since
any sketch we search for will be less than the sketch of $e$, since it will only contain 1s.
Therefore, we will get the max sketch in the subtree, which is the max value\footnote{Note that the reason the sketch search failed for $q$ was that
it branched on some bit that was not in the sketch. With $e$, we only care about getting the maximum/minimum sketch.}.
If we are looking for the successor and went left when the $x_i$s went right, $e=y100\dots0$.

So to sum up the big picture:
we start out by computing $sketch(q)$. Then we find $i$ such that $sketch(x_i) \leq sketch(q) < sketch(x_{i + 1})$,
and set $y=\mathcal{LCA}(q, x_i) or \mathcal{LCA}(q, x_{i+1})$, depending on which is the longer.
We then compute $e$, find $sketch(e)$, and at last find $j$ such that $sketch(x_j) \leq sketch(e) < sketch(x_{j + 1})$,
We claim that this will give us the right answer, for both successor and predecessor queries.

However, there are a lot of details that needs explanation.
How do we make the sketches? How do we search in the $x_i$s?
What about computing the \LCA{}?
We look at these questions, in order.

\subsection{Approximate Sketch}
We remember that the perfect sketch takes the exact bits we care about, and packs them right after another,
using $O(w^{2/5})$ space.
How can we do this?
We could start off by masking out the important bits; this is easy.
However, packing them together to be consecutive is harder.
Maybe we do not need to perfectly pack them together? We can allow to have some zeroes in between the important bits,
if they are in a predictable pattern, since this does not change the ordering of the sketches.
We will make an \emph{approximate sketch}, of size $O(w^{4/5})$ bits, making the total bits in a node $O(w)$, which is our limit.

We start out by masking out the important bits:
\[x' = x \texttt{ AND } \sum\limits^{r-1}_{i = 0} 2^{b_i} = \sum\limits^{r-1}_{i = 0} x_{b_i} 2^{b_i}\]
In order to pack the bits somewhat together, we will use multiplication.
%Recall that binary multiplication is basically shifting and adding for each high bit in one of the operands:
\[x' \cdot m = (\sum\limits^{r-1}_{i = 0} x_{b_i} 2^{b_i}) (\sum\limits^{r-1}_{j = 0} 2^{m_j}) =
\sum\limits^{r-1}_{i = 0} \sum\limits^{r-1}_{j = 0} x_{b_i} 2^{b_i + m_j}
That is, $b_i + m_j$ is something.
We claim that there exist an $m$ such that
    \item\label{en:as-distinct} $b_i + m_j$ are all distinct, so we avoid collisions,
    \item\label{en:as-order} $b_0 + m_0 < \dots b_{r-1} + m_{r-1}$, so we preserve ordering, and
    \item\label{en:as-small} $b_{r-1} + m_{r-1} - b_0 + m_0 = O(r^4) = O(w^{4/5})$, so the number is small.
\[approx\_sketch = \big[(x \cdot m) \texttt{ AND } \sum\limits^{r-1}_{i=0}2^{b_i + m_i}\big] \gg (b_0 + m_0)\]
Note that we have discarded all terms where $i \neq j$.

We first consider~\ref{en:as-distinct}.
Choose $m'_0, \dots m'_{r-1} < r^3$, such that all $b_i + m'_j$ are distinct mod $r^3$.
Do the choosing by induction. When we want to pick $m'_t$,
we must avoid all numbers equal to $m'_i + b_j - b_k\ \forall_{i, j, k}$ (we moved $b_i$ from the left to $b_k$ on the right).
$i$ ranges from 0 to $t$, and $j$ and $k$ ranges from 0 to $r$, so the total numbers we must avoid is $tr^2$.
But $t < r$, and we are working $\mod r^3$, so there is a number that is available.

Next, we let $m_i = m'_i + (w - b_i + ir^3) \text{ rounded down to a multiple of }r^3$.
We use the $ir^3$ part to spread the bits out, since each $m_i < r^3$; this achieves~\ref{en:as-order}.
In order not to mess up the collision freeness we have achieved, we need the term we add to be
a multiple of $r_3$, so that $m_i \equiv m'_i \mod r^3$.
\todo{Not sure why we need $-b_i$, since $+ir^3$ guarantees ordering?}.
Since $m_{r-1} = O(r^4)$ and $m_0 \approx m'_0 < r^3$ and $m'_0 \geq 0$, we get $m_{r-1} - m_0 = O(r^4)$.

\subsection{Parallel Comparison}
Now all sketches for all keys in a node takes $O(w)$ space.  We want to search in all of them, in constant time.
We use subtraction to compare the keys.
Let $sketch(node) = 1\ sketch(x_0)\ 1\dots1\ sketch(x_{k-1})$, and let $sketch{(q)}^k = 0\ sketch(q)\ 0\dots0\ sketch(q)$;
this can be computed by multiplying the sketch of q with $00^n10^n1\dots0^n1$, where $n$ is the length of a sketch minus 1.
Now we can look at $sketch(node) - sketch{(q)}^k$:
the 1 bits in front of the sketch will be zero if and only if $sketch(q) > sketch(x_0)$. We can mask out these bits.
Note that since the $x_i$s are in order, this sequence is monotone, and of the form $0^a1^b$.

Now we are interested in knowing where the transition is.
The way we will do this is to multiply the masked number with $0^n1\dots0^n1$:
this multiplication will make all of the 1s in the number to hit the same bit in the product,
which will be summed.
That is, instead of finding the position of the transition, we simply count the number of high bits.
This is the number of $x_i$s that are larger than $q$.
Note that we also have space in the sketch room for the entire sum, since the length of a sketch is at most $k - 1$,
and we only have at most $k$ 1s to sum.

Lastly, we need to compute the LCA for our mental tree.
This is equivalent to the first set bit in the \texttt{XOR} of the two numbers, also called the most significant bit.
We split the word into $\sqrt{w}$ clusters of $\sqrt{w}$ bits each.
Then we  find the non-empty clusters: get the \texttt{msb}s of the clusters, and clear the \texttt{msb}s from $x$.
Subtract the cleared $x$ from the \texttt{msb} mask, and mask out all bits except the \texttt{msb}.
\texttt{XOR} with the mask again, to flip the \texttt{msb}s.
Now, each block is either $\texttt{00\dots00}$ or $\texttt{10\dots00}$, depending on wether the bits except \texttt{msb} was 0 or not.
The special case we still need to handle is when a cluster is $\texttt{10\dots00}$. \texttt{OR} in the \texttt{msb}s, to fix this.
Now all non-empty clusters are $\texttt{10\dots00}$, and empty clusters are $\texttt{00\dots00}$.

Next we need to find the first non-empty cluster, and then find the first set bit in that cluster.
To do this we would first like to compress the bits into a string of just the bits we care about, of length $\sqrt{w}$.
Now we use \emph{perfect} sketch (details omitted).
Then, to find the most significant set bit, we use parallel comparison, by comparing the sketch repeated to one-hot strings of length $\sqrt{w}$.
By finding the largest power of two the string is greater than we find the \texttt{msb}.
Note that since the size of the sketch is $\sqrt{w}$, and there are $\sqrt{w}$ one-hot strings of length $\sqrt{w}$,
the total space for all the one-hot strings is $w$, so it fits.

Now we have found the first cluster that has a set bit. Next, we go to that cluster, and use the exact same method again to find the
\texttt{msb} there. This works since the sketch and the cluter are the exact same size.
At last, we combine the two indices (cluster index and internal index to the cluster) to get the global index.

\chapter{Succinct Structures}
Rank, Select