~mht/ads-book

324fd29e7f5144d6692d00e56aad9ef8c13ad3db — Martin Hafskjold Thoresen 8 months ago 3f6946d
Add back in lstlisting and update introduction
1 files changed, 40 insertions(+), 29 deletions(-)

M book.tex
M book.tex => book.tex +40 -29
@@ 21,13 21,22 @@
\captionsetup[subfigure]{subrefformat=simple,labelformat=simple}

\usepackage{listings}
\input{listingoptions.tex}
\lstdefinestyle{Algo}{%
  basicstyle=\normalsize\mdseries\itshape,
  columns=flexible,
  xleftmargin=15pt,
  numbers=left,
  keywordstyle={\bf},
  mathescape,
  commentstyle={\color{gray}},
  keywords={for,if,else,return,in,to,while,continue,break,output,not}%
}

\usepackage{lipsum}

\title{Advanced Data Structures}
\author{Martin Hafskjold Thoresen}
\date{\today}
\date{Updated: July 23, 2017}

\newcommand{\topics}[1]{Topics: \textit{#1}}
\newcommand{\code}[2]{\textsc{#1}$(#2)$}


@@ 51,8 60,9 @@
\maketitle

\chapter*{Introduction}
This book is a collection of notes the course \textit{Advanced Data Structures} at ETH Z\"urich, in the spring of 2017.
This book is a collection of course notes for the course \textit{Advanced Data Structures} at ETH Z\"urich, in the spring of 2017.
The chapters are arranged in the same way as the lectures, and some chapters covers material from two lectures.
Most of the lectures were either loosely or firmly based off of Eric Demaines MIT course 6.851 of the same title\footnote{\url{http://courses.csail.mit.edu/6.851/spring21/}}.

\tableofcontents



@@ 191,7 201,7 @@ It turns out that \LCA{} and \RMQ{} are equivalent.

\subsection{Reduction from \RMQ{} to \LCA{}}

\todo{Add tree from lecture notes}
%todo{Add tree from lecture notes}
Build a \emph{Cartesian Tree\label{sec:cartesian-tree}}:
Walk through the array, while keeping track of the right spine of the tree.
When inserting a new element, if the element is the largest element in the spine, insert it at the end.


@@ 225,7 235,7 @@ A consequence of this is that we get \emph{Universe reduction}, where we have ma
\label{sec:first-euler-tour}
We start by reducting the problem to $\pm 1 \textsc{RMQ}$, in which adjacent elements of the array differs by at most 1.
Walk an \emph{Euler Tour} of the tree:
\todo{Add figure}
%todo{Add figure}
visit every edge twice, and for each edge write down the node we left.
Each node store a pointer to the first visit of that node in the array, and the array elements store a pointer to its element in the tree.
\RMQ{} and \LCA{} are still equivalent.


@@ 323,7 333,7 @@ In other words, we make all queries start at leaves.

Split the tree in two layers by the maximally deep nodes.
The number of leaves in the top part is now $O(n/\log n)$, since
\todo{eh?}
%todo{eh?}
for each $1/4 \log n$ nodes in the original tree we have ``replaced'' it with a subtree (the bottom structure).
If we now use Step 5 on the top, we get $O(n)$ space.



@@ 359,10 369,10 @@ Given a set of strings $T_1, \dots, T_k$, we query with a pattern $P$ and want t
    A Trie sis a rooted tree with child branches labeled with letters in $\Sigma$.
Let $T$ be the number of nodes in the trie. This is bounded by $\sum^k_{i=1} T_i$ (equality if no pair of strings share a prefix).
\end{definition}
\todo{Add figure}
%todo{Add figure}

A trie can encode multiple strings, by having the edges in a path from the root to a leaf spell out the string.
However, we need a terminal symbol $\$$ to denote the end of a string, so we can have prefixes of a string in the same trie\todo{Bad expl?}.
However, we need a terminal symbol $\$$ to denote the end of a string, so we can have prefixes of a string in the same trie.%todo{Bad expl?}.
If each node traverses its edges in sorted order an in-order traversal of the trie yields the strings of the trie in sorted order.

\subsection{Trie Representation}


@@ 400,7 410,7 @@ This makes the query time $O(P + \log\log\Sigma)$, since we only use the tree on
Instead of having a balanced BST over each nodes children, we can weight each child with the number of leaves in its subtree.
This ensures that every second jump in the BST either reduces the number of candidate strings \emph{in the trie} to $2/3$ of its size,
or it finds a new trie node in the WBBST (hence we advance $P$ one letter).
An intuition for this claim is this\todo{Add array figure}:
An intuition for this claim is this:%todo{Add array figure}:
we might be so lucky as to cut out $1/2$ of the leaves when leaving a node, unless there is some really heavy child in the middle
(remember we have to retain ordering).
But then in the next step this large child will surely be either to the far left or to the far right, which means we either follow it


@@ 453,7 463,7 @@ The edge labels are typically stored as indices in the string, instead of the st
Instead of appending $\$$ to each of the suffixes, which are the strings we are inserting into the tree,
we can simply append $\$$ to the string $T$, since this will make it the last character in all of the suffixes.
The structure takes $O(T)$ space.
\todo{add figure}
%todo{add figure}

\subsection{Applications of Suffix Trees}
Suffix trees are surprisingly useful, and with some of the results from Chapter~\ref{ch:statictrees}


@@ 474,11 484,11 @@ This can be done in $O(T)$ time using suffix trees, since it is the branching no
How long is the longest common substring for $T[i..]$ and $T[j..]$?
Find the two nodes \LCA{} in $O(1)$ time to get the common prefix.

\subsubsection{Something more}
\todo{here}
% \subsubsection{Something more}
% \todo{here}

\subsubsection{TODO this}
\todo{here}
% \subsubsection{TODO this}
% \todo{here}

\section{Suffix Arrays}
While suffix trees are constructable in $O(T)$ time, it is difficult.


@@ 545,12 555,12 @@ We make a Cartesian Tree (see Section~\ref{sec:cartesian-tree}) of the \algo{LCP
This time we put \emph{all} minimum values at the root\footnote{note that the number of 0s in the array is equal to the number of different characters in $T$}.
The suffixes of $T$ are the leaves of the tree.
Note that the \algo{LCP} value of the internal nodes is the letter depth of that node,
\todo{Add figure}
%todo{Add figure}
so the edge length between two internal nodes is the difference in \algo{LCP}.
We know from Section~\ref{sec:cartesian-tree} that this is doable in linear time.

\subsection{Construction}
If we have the suffix array it is possible to construct the \algo{LCP} array in linear time\todo{ref}.
If we have the suffix array it is possible to construct the \algo{LCP} array in linear time.%todo{ref}
We look at a method of constructing the suffix array from scratch in $O(T + \text{sort}(\Sigma))$ time.

\begin{definition}{$\big<a, b\big>$}


@@ 675,7 685,7 @@ We take a similar approach to full persistency as we did with partial persistenc
The first difference is that we need \emph{all} pointers to be bi-directional, and not only the field pointers, as previously.
The second and most important difference is that now versions of the structure is no longer a line, but a tree.
In order to go around this problem we linearize the version tree:
traverse the tree, and write out first and last visit of each node\todo{add figure}.
traverse the tree, and write out first and last visit of each node.%\todo{add figure}

However, we need this structure to be dynamic,
so we use an \emph{Order Maintenance data structure},


@@ 784,7 794,7 @@ We now look at an example of a retroactive priority queue that supports
We assume keys are only inserted once.

We can plot the lifetime of the queue in 2D, where the x dimension is time and the y dimensions in key value.
\todo{add plot}
%todo{add plot}
Keys in the queue are plotted as points when they are inserted, and are extended as horizonal rays.
On \algo{Delete-Min}, we shoot a ray from the x-axis at the time of the delete upward untill it hits a horizontal ray.
This makes $\rceil$ patterns.


@@ 843,7 853,7 @@ and that the list is in a strictly monotone ordering.
Let \emph{Label Space} be the size of the labels as a function of the number of elements in the list we want to store.
Table~\ref{tab:list-labeling} shows the best known updates for different sizes of the label space.

\todo{Come back to this}
%todo{Come back to this}

\begin{table}[b]
    \centering


@@ 1234,7 1244,7 @@ We claim that $\sqrt{n}$ updates and $\sqrt{n}$ verify sums require $\Omega(\sqr
which implies a lower bound of $\Omega(\log n)$ per operation, since we do $\sqrt{n}$ block operations, which all corresponds to $\sqrt{n}$ graph operations.

\subsection{The Proof}
\todo{wtf is this}
%todo{wtf is this}
Similar to in Section~\ref{sec:partial-sum} we will consider the interleaving access pattern.
We will look at how much information has to be carried over from the left to the right subtree for a given node.
We claim that that every node in a right subtree have to do $\Omega(l\sqrt{n})$ expected cell probes reading cells that were written


@@ 1259,7 1269,7 @@ In fact, these two things convey the same information, as both is reconstrucable
However, this time we know that the queries, which asks if a composition prefix of the permutations is the same as the given permutation,
always answer ``Yes''.

\todo{how does this work??}
%todo{how does this work??}

Short version:
We end up also encoding a separator of the sets $R\setminus W$ and $W\setminus R$, where $R$ and $W$ are the cells read and writte to in the right and left subtree respectively.


@@ 1507,7 1517,7 @@ Next, we let $m_i = m'_i + (w - b_i + ir^3) \text{ rounded down to a multiple of
We use the $ir^3$ part to spread the bits out, since each $m_i < r^3$; this achieves~\ref{en:as-order}.
In order not to mess up the collision freeness we have achieved, we need the term we add to be
a multiple of $r_3$, so that $m_i \equiv m'_i \mod r^3$.
\todo{Not sure why we need $-b_i$, since $+ir^3$ guarantees ordering?}.
%todo{Not sure why we need $-b_i$, since $+ir^3$ guarantees ordering?}.
Since $m_{r-1} = O(r^4)$ and $m_0 \approx m'_0 < r^3$ and $m'_0 \geq 0$, we get $m_{r-1} - m_0 = O(r^4)$.

\subsection{Parallel Comparison}


@@ 1628,7 1638,8 @@ As with Rank, it is possible to get Select in $O(n/\log^k n)$ bits for any $k=O(
\section{Navigating Binary Trees}
We look at a succinct representation of a binary tree.
Consider the tree encoding obtained by depth first search where each node outputs one bit for each children
that is a 1 if the child is present and 0 if it is not\todo{figure}. This representation clearly uses $2n$ bits.
that is a 1 if the child is present and 0 if it is not.%\todo{figure}
This representation clearly uses $2n$ bits.
Now if we insert a $(1)$ in front of the encoding, we can nagivate the tree using Rank and Select from the previous section,
using the encoding as a bit string.



@@ 1684,7 1695,7 @@ In addition, we have a C-style \texttt{enum}, $turn$, which is either $his$ or $
\begin{figure}[ht]
    \centering
    \begin{subfigure}{0.45\textwidth}
        \begin{lstlisting}[title={Alices code}]
        \begin{lstlisting}[style=Algo,title={Alices code}]
she_wants = T
turn = his
while he_wants and turn=his


@@ 1696,7 1707,7 @@ she_wants = F
        \end{lstlisting}
    \end{subfigure}
    \begin{subfigure}{0.45\textwidth}
        \begin{lstlisting}[title={Bobs code}]
        \begin{lstlisting}[style=Algo,title={Bobs code}]
he_wants = T
turn = hers
while she_wants and turn=hers


@@ 1735,7 1746,7 @@ problem from Section~\ref{sec:mutex} where one thread may overwrite anothers dat
    It is possible to make a lock using the $CAS$ instruction:
    Note that we do not need to $CAS$ the write at line 7, since the thread from the critical section
    is the only thread that will change the value of $lock$ when it is set to $F$.
    \begin{lstlisting}[title={CAS-Lock}]
    \begin{lstlisting}[style=Algo,title={CAS-Lock}]
lock = F
while !CAS(&lock, F, T)
    spin


@@ 1750,7 1761,7 @@ lock = F
    We make a map-fold style reducer using the $CAS$ instruction.
    $result$ is the shared variable for the accumulated result.
    We imagine that the threads all get a different slice of the array, which they loop over.
    \begin{lstlisting}[title={CAS-Lock}]
    \begin{lstlisting}[style=Algo,title={CAS-Lock}]
for f in array
    tmp = compute(f)
    do {


@@ 1787,7 1798,7 @@ The code for $\textsc{Push}$ and $\textsc{Pop}$ is shown below.
\begin{figure}[ht]
    \centering
    \begin{subfigure}{0.45\textwidth}
        \begin{lstlisting}[title={Push(v)}]
        \begin{lstlisting}[style=Algo,title={Push(v)}]
n $= \textsc{Make-Node}(v)$
do {
    n.next = Read(head)


@@ 1796,7 1807,7 @@ do {
        \end{lstlisting}
    \end{subfigure}
    \begin{subfigure}{0.45\textwidth}
        \begin{lstlisting}[title={Pop()}]
        \begin{lstlisting}[style=Algo,title={Pop()}]
curr = Read(head)
while curr {
    if CAS(head, curr, curr.next)