~mht/ads-book

d5748a460e4b9280d495c1ced002870572fe360f — Martin Hafskjold Thoresen 5 years ago 328350f
Fix label spacing and replace $$ with \[ \]
1 files changed, 19 insertions(+), 21 deletions(-)

M book.tex
M book.tex => book.tex +19 -21
@@ 74,7 74,7 @@ However, encoding a totally random function takes $O(u \log m)$ space.
This is a problem since $n = U$, which is large.
Therefore we settle on a family $\mathcal{H}$ of hash functions of small size which is \emph{Universal}.
Universality means that the probability of two non-equal keys having the same hash value is $O(1/m)$:
$$\forall x \neq y\ P_{h \in \mathcal{H}}[h(x) = h(y)] = O(1/m)$$
\[\forall x \neq y\ P_{h \in \mathcal{H}}[h(x) = h(y)] = O(1/m)\]

\begin{example}
    $h(x) = ((a x) \mod p) \mod m$ where $0<a<p$, $p$ is prime and $p > m$ is a universal hash function.


@@ 82,7 82,7 @@ $$\forall x \neq y\ P_{h \in \mathcal{H}}[h(x) = h(y)] = O(1/m)$$

\begin{definition}{k-independent}
    A family of hash functions $\mathcal{H}$ is k-independent if
    $$\forall x_1, \dots x_k\ \Pr[\bigwedge\limits_{i} h(x_i) = t_i] = O(1/m^k)$$
    \[\forall x_1, \dots x_k\ \Pr[\bigwedge\limits_{i} h(x_i) = t_i] = O(1/m^k)\]
\end{definition}

\begin{example}


@@ 100,7 100,7 @@ Tabulation hashing is a hashing scheme.
We view $x$ as a vector of bit blocks $x_1, \dots, x_c$,
and use a totally random hash table on \emph{each} block.
Then we \texttt{xor} the blocks together to get the final value:
$$h(x) = T_1(x_1) \oplus T_2(x_2) \oplus \dots \oplus T_c(x_c)$$
\[h(x) = T_1(x_1) \oplus T_2(x_2) \oplus \dots \oplus T_c(x_c)\]
Since a block is $u/c$ bits, the universe for one block is of size $u^{1/c}$.
One table is of size $u^{1/c}$ (we assume the hash value are machine words, as in~\cite{Patrascu:2011:PST:1993636.1993638}),
so the total space used is $O(cu^{1/c})$.


@@ 121,7 121,7 @@ so $\E[C_t] = O(1)$, for $m = \Omega(n)$

Since we need to traverse the list for all operations, the cost is quadratic in $C_t$,
so we are interested in $\E[C_t^2]$:
$$ \E[C_t^2] = 1/m \sum_{s \in [m]} E[C_s^2] = 1/m \sum_{i \neq j} \Pr[h(x_i) = h(x_j)] $$
\[\E[C_t^2] = 1/m \sum_{s \in [m]} E[C_s^2] = 1/m \sum_{i \neq j} \Pr[h(x_i) = h(x_j)]\]
If we have universal hashing this is $\frac{1}{m} n^2 O(\frac{1}{m}) = O(n^2/m^2) = O(1)$ for $m = \Omega(n)$.
With a totally random hash function $C_t = O(\frac{\log n}{\log\log n})$ with probability $1 - 1/n^c$ for any $c$.
This also holds for simple tabulation hashing.


@@ 171,7 171,7 @@ If the hashing functions are fully random or $\log n$-wise independent we get $O



\chapter{Static Tree Queries}
\chapter{Static Tree Queries}%
\label{ch:statictrees}

\topics{Range Minimum Query (\code{RMQ}{i, j}), Lowest Common Ancestor (\code{LCA}{x, y}), Level Ancestor (\code{LA}{x, n}), Range Queries, Range Trees.}


@@ 216,7 216,7 @@ in the array. However, since we are not interested in the actual minimum, but on
the two arrays act exactly the same.
A consequence of this is that we get \emph{Universe reduction}, where we have mapped the universe $\mathcal{U}$ to $[n-1]$.

\section{Constant time \LCA{} (\RMQ{})}
\section{Constant time \LCA{} (\RMQ{})}%
\label{sec:constant-lca}

\subsubsection{Step 1: Reduction}


@@ 312,7 312,7 @@ Since all nodes have constant access to a leaf node (by its ladder, of which the
only leaves need to store the jump pointers.
In other words, we make all queries start at leaves.

\subsubsection{Step 6: Leaf Trimming}
\subsubsection{Step 6: Leaf Trimming}%
\label{sec:first-leaf-trim}
\begin{definition}{Maximally Deep Node}
    A node with $\geq \frac{1}{4} \log n$ descendants.


@@ 435,7 435,7 @@ Table~\ref{tab:trie-node-repr} sums up all approaches, with query time and space
        4 & Weight-balanced BST & $O(P + \log k)$ & $O(T)$\\
        5 & Leaf Trimming + (1) + (4) & $O(P + \log\Sigma)$ & $O(T)$
    \end{tabular}
    \caption{Table over node representation, query time, and space requirement.}
    \caption{Table over node representation, query time, and space requirement.}%
\label{tab:trie-node-repr}
\end{table}



@@ 512,7 512,7 @@ This allows for faster searching, as we know how much of the adjacent suffix mat
                5 & \texttt{a\$}\\
                6 & \texttt{\$}
            \end{tabular}
            \caption{The suffixes of $T$
            \caption{The suffixes of $T$%
\label{tab:suffixes}}
        \end{subfigure}
        \begin{subfigure}{0.45\textwidth}


@@ 527,7 527,7 @@ This allows for faster searching, as we know how much of the adjacent suffix mat
                4 & \texttt{na\$}       & 2 \\
                2 & \texttt{nana\$}     & $-$
            \end{tabular}
            \caption{The Suffix Array of $T$.
            \caption{The Suffix Array of $T$.%
\label{tab:suffix-array}
            }
        \end{subfigure}


@@ 586,7 586,7 @@ Note that the number of suffixes when recursing is $O(2/3)$ of what we started w

\subsubsection{Step 5}
Now we want to use the suffix array just obtained to radix sort the suffixes in $T_2$.
Note that $$T_2[i..] = T[3i+2..] = \big<T[3i+2],\ T[3i+3..]\big> = \big<T[3i+2],\ T_0[i+1..]\big>.$$
Note that \[T_2[i..] = T[3i+2..] = \big<T[3i+2],\ T[3i+3..]\big> = \big<T[3i+2],\ T_0[i+1..]\big>.\]
We can take off the first letter of the suffix, and get a suffix which we know the sorted order or, since it is in $T_0$, which we sorted in Step 4.
This is like Radix sort, but with only two values, both of which can be compared in $O(1)$.
This allows us to sort $T_2$.


@@ 654,7 654,7 @@ We can use the \emph{Potential Method} for amortized analysis.
Remember that \emph{Amortized cost} is actual cost + change in potential.
Let $\Phi = c \cdot \Sigma$, where $\Sigma$ is the number of modifications in all nodes latest versions.
Observe that when we make a change, the amortized cost is
$$\leq c + c\ [- 2cp + p \cdot \text{recursions}]\text{?}$$
\[\leq c + c\ [- 2cp + p \cdot \text{recursions}]\text{?}\]
The first $c$ is for computation, the second $c$ is the added potential, since we
just added one modification to a node,
and $[-2cp + p \cdot \text{recursions}]\text{?}$ are for the recursions, if there is one.


@@ 733,7 733,7 @@ If we want to delete the $+3$, we can add a $-3$ at the end.
\subsection{Full Retroactivity}
What do we need for full retroactivity?

\begin{definition}{Decomposable Search Problem}
\begin{definition}{Decomposable Search Problem}%
\label{sec:dsp}
    $\text{query}(x, A\cup B) = f(\text{query}(x, A), \text{query}(x, B))$
\end{definition}


@@ 768,7 768,7 @@ To see why this is the case, consider a very simply computer with two registers 
$X=x$, $Y\mathrel{+}=\Delta$, and $Y = XY$.
On query, the machine returns $Y$, and all operations are $O(1)$.
The operation sequence
$$\big<Y \mathrel{+}= a_n,\ Y=XY,\ Y\mathrel{+}= a_{n-1},\ \dots,\ Y\mathrel{+}= a_0\big>$$
\[\big<Y \mathrel{+}= a_n,\ Y=XY,\ Y\mathrel{+}= a_{n-1},\ \dots,\ Y\mathrel{+}= a_0\big>\]
computes the polynomial $\sum^{n}_{i} a_{i}x^{i}$.
We can use this to compute the polynomial for any $X$ by retroactively inserting $X=x$ at $t=0$.
However, it is not possible to reeveluate such a polynomial using field operations any faster than to just evaluate it again.


@@ 800,20 800,18 @@ In order to make things easier we define a \emph{Brige}.
\end{definition}

Now if $t'$ is the bridge preceding $t$ we see that
$$\max\{k'\mid k' \text{\ deleted at time}\geq t\} = \max\{k'\notin Q_{\text{now}} \mid k' \text{\ inserted at time}\geq t'\}$$
\[\max\{k'\mid k' \text{\ deleted at time}\geq t\} = \max\{k'\notin Q_{\text{now}} \mid k' \text{\ inserted at time}\geq t'\}\]
In other terms, the largest key deleted after a time $t$ is the largest key inserted after the previous bridge that is not in the final set.

Now we can store $Q_{\text{now}}$ as a BBST on values, and all insertions in a BBST on time.
The latter trees nodes is also augmented with the value of the largest insert in its subtree that is not in $Q_{\text{now}}$.
At last, we store a third BBST with \emph{all} the updates, ordered on time, and also augmented as follows:
$$
\text{Augmentation} =
\[\text{Augmentation} =
\begin{cases}
    0 \text{\ if \code{insert}{k}, and\ }k \in Q_\text{now}\\
    +1 \text{\ if \code{insert}{k}, and\ }k \notin Q_\text{now}\\
    -1 \text{\ if \algo{Delete-Min}}\\
\end{cases}
$$
\end{cases}\]
In addition the internal nodes store subtree sums and min/max prefix sums.
We do this in order to detect brdiges, since a bridge is a prefix summing to 0.
When we have to find out which element to insert into $Q_{\text{now}}$ we can walk up from the node in the insertion BST,


@@ 836,7 834,7 @@ with $O(\log^2 m)$ overhead. However, it is also possible to get full retroactiv
Before tackeling the real problem, we look at an easier problem.

\subsection{List Labeling}
We want to store integer labels in a list, such that the list, such that insert/delete here queries are constant,
We want to store integer labels in a list, such that insert/delete here queries are constant,
and that the list is in a strictly monotone ordering.
\emph{Label Space} is the size of the labels as a function of the number of elements in the list we want to store.
Table~\ref{tab:list-labeling} shows the best known updates for different sizes of the label space.


@@ 874,7 872,7 @@ Dynamic partial sums, dynamic connectivity.

van Emde Boas, x-fast trees, y-fast trees, fusion trees.

\section{van Emde Boas Tree}
\section{van Emde Boas Tree}%
\label{sec:vEB}

\chapter{Succinct Structures}