61e0d7239b2a23297557c903f81066d9124258fb — Martin Hafskjold Thoresen 5 years ago 44fcac2
Begin on Static Tree Queries
1 files changed, 116 insertions(+), 3 deletions(-)

M book.tex
M book.tex => book.tex +116 -3
@@ 1,9 1,12 @@
\usepackage[a4paper, top=3cm,bottom=3cm, left=3cm, right=3cm, paperheight=23cm]{geometry}
\usepackage[a4paper, top=3cm,bottom=3cm, left=3cm, right=3cm%
%, paperheight=23cm

@@ 16,9 19,17 @@

\newcommand{\topics}[1]{Topics: \textit{#1}}


\newenvironment{example}[0]{{\parindent=0em \textbf{Example:}}}{\vspace{1em}}
\newenvironment{definition}[1]{{\parindent=0em \textbf{Definition} --- \emph{#1}:\\}}{\vspace{1em}}

\DeclareMathOperator*{\argmin}{arg\,min} % thin space, limits underneath in displays

% Expectation symbol

@@ 150,8 161,110 @@ If the hashing functions are fully random or $\log n$-wise independent we get $O

\chapter{Static Tree Queries}

Lowest Common Ancestor (\texttt{LCA}), Level Ancestor (\texttt{LA}),
Range Queries, Range Trees.
\topics{Range Minimum Query (\code{RMQ}{i, j}), Lowest Common Ancestor (\code{LCA}{x, y}), Level Ancestor (\code{LA}{x, n}), Range Queries, Range Trees.}

We would like $O(1)$ query time on selected operations, and still only $O(n)$ space;
if we allow $O(n^2)$ space this is trivial, as we can simply precompute all queries and store them.

\section{Range Minimum Query}
We would like to retrieve the index of the minimum element in an array, between index $i$ and $j$.
Our goal is to get $O(n)$ time and space preprocessing, and constant time query.
Note that this is easy to do in $O(\log n)$ time using Range Trees ---
store minimum in internal nodes, and traverse from $i$ to $j$ in $O(\log n)$ time.

It turns out that \LCA{} and \RMQ{} are equivalent.

\subsection{Reduction from \RMQ{} to \LCA{}}

\todo{Add tree from lecture notes}
Build a \emph{Cartesian Tree}:
Walk through the array, while keeping track of the right spine of the tree.
When inserting a new element, if the element is the largest element in the spine, insert it at the end.
Else, we have an edge from $v$ to $w$, where the new element $a$ should be in between.
Make $a$ the right child of $v$, and make $w$ the \emph{left} child of $a$.
This step looks like it will be linear, but
the more we have to search through the right spine, the more swaps we do,
and the smaller the spine gets, so it amortizes to constant time,
making the algorithm linear.

Note that simpler divide and conquer algorithm runs in $O(n \log n)$.

We see that \code{LCA}{i, j} in the cartesian tree is the same as \code{RMQ}{i, j} in $A$.

\subsection{Reduction from \LCA{} to \RMQ{}}

Traverse the tree in-order, and write out the \emph{depth} of each node to $A$.
Now \code{RMQ}{i, j} = \code{LCA}{i, j}.
Naturally, since this scheme should work for any tree, we cannot write out the node values, since
the tree may not be a cartesian tree.

Note that if we go from \RMQ{} to \LCA{} and back again, we end up with different numbers
in the array. However, since we are not interested in the actual minimum, but only the index of the minimum,
the two arrays act exactly the same.
A consequence of this is that we get \emph{Universe reduction}, where we have mapped the universe $\mathcal{U}$ to $[n-1]$.

\section{Constant time \LCA{} (\RMQ{})}

\subsubsection{Step 1: Reduction}
We start by reducting the problem to $\pm 1 \textsc{RMQ}$, in which adjacent elements of the array differs by at most 1.
Walk an \emph{Euler Tour} of the tree:
\todo{Add figure}
visit every edge twice, and for each edge write down the node we left.
Each node store a pointer to the first visit of that node in the array, and the array elements store a pointer to its element in the tree.
\RMQ{} and \LCA{} are still equivalent.

We need this later on (step 4).

\subsubsection{Step 2: Precomputation}
Get $O(1)$ time and $O(n \log n)$ space \RMQ{}.
Precompute and store all queries from any starting point where the interval length is a power of 2.
Since there are $n$ starting points, and $\log n$ powers of 2 to choose from, there are $n \log n$ such queries.
The key observation is that any arbitrary interval is the union of two such intervals.
For instance $A[4..11] = A[4..8] \cup A[7..11]$.
The double counting does not matter, since $\min$ is an idempotent operation.
The intervals are trivially computed.

\subsubsection{Step 3: Indirection}
Make a two layer structure: divide the numbers in $A$ into groups of size $1/2 \log n$, which makes the bottom layer.
The top layer consists of the minimum element in each block. Since there are $n/(2 \log n) = 2n/\log n$ such blocks,
there are equally many items in the top layer.

A query in this structure consists of (potentially) three parts:
A query in the bottom block in which $i$ is,
a query in the top block for all blocks which are completely covered by the interval $[i, j]$,
and a query in the bottom block in which $j$ is.
We need all three queries to be $O(1)$.

The gain from this is that the top layer only stores $O(n/ \log n)$, so we can afford Step 2, since the $\log$ factors cancel.
We get $O(1)$ query and $O(n)$ space for the top structure.

\subsubsection{Step 4: Lookup Tables}
We use lookup tables for the bottom groups.
The groups are of size $n' = 1/2 \log n$.
\RMQ{} queries in these groups are invariant over value ``shifts'' in the gruop:
if we add $a$ to all elements in the group, the queries are still the same.
Shift all groups by its first element, such that all groups start with 0.
Now every group is completely defined by the difference of adjacent elements,
so the blocks can be encoded as a bitstring of the same length as a block, where 0 is decreasing and 1 is increasing:
$[3,4,5,4] \rightarrow [0,1,2,1] \rightarrow [-, 1, 1, 0]$.
There are $2^{n'} = \sqrt{n}$ possible such blocks,
${(1/2 \log n)}^2$ possible queries, and each answer requires $\log \log n$ bits,
so storing a lookup tables for all possible blocks, over all possible queries with all possible
answers take $\sqrt{n} {(1/2 \log n)}^2 \log \log n = o(n)$ bits.
Now each bottom block can simply store a pointer into the table, and we get $O(1)$ query for the bottom groups.

\section{Constant Time \LA{}}
Level Anscestor queries take a node $x$ and a level $n$, and the goal is to find the $n$th parent of $x$ in the tree.
The simplest way to do this is for each node to store its parent, making the query $O(n)$.
We want $O(1)$.

\subsubsection{Step 1: Jump Pointers}
Instead of having each node storing only its parent, each node can store its $2^k$th parent.
Each node has $O(\log n)$ such parents, making the space requirement $O(n \log n)$.