b066471bd3add214218193370fa16a00436adfdd — Martin Hafskjold Thoresen 5 years ago d5748a4
Write geometry and connectivity
1 files changed, 308 insertions(+), 5 deletions(-)

M book.tex
M book.tex => book.tex +308 -5
@@ 219,7 219,8 @@ A consequence of this is that we get \emph{Universe reduction}, where we have ma
\section{Constant time \LCA{} (\RMQ{})}%

\subsubsection{Step 1: Reduction}
\subsubsection{Step 1: Reduction}%
We start by reducting the problem to $\pm 1 \textsc{RMQ}$, in which adjacent elements of the array differs by at most 1.
Walk an \emph{Euler Tour} of the tree:
\todo{Add figure}

@@ 601,7 602,8 @@ Since all operations are linear, with the exception of the recursive call, we ge
$T(n) = T(\frac{2}{3} n) + O(n) = O(n)$.
Linear time!

\chapter{Temporal Structures}
\chapter{Temporal Structures}%

\topics{Partial persistency, full persistency, funcitonal persistency, pareial retroactivity, full retroactivity.}

@@ 834,11 836,13 @@ with $O(\log^2 m)$ overhead. However, it is also possible to get full retroactiv
Before tackeling the real problem, we look at an easier problem.

\subsection{List Labeling}
We want to store integer labels in a list, such that insert/delete here queries are constant,
We want to store integer labels in a list, such that insert/delete queries around a node in the list are constant,
and that the list is in a strictly monotone ordering.
\emph{Label Space} is the size of the labels as a function of the number of elements in the list we want to store.
Let \emph{Label Space} be the size of the labels as a function of the number of elements in the list we want to store.
Table~\ref{tab:list-labeling} shows the best known updates for different sizes of the label space.

\todo{Come back to this}

    \begin{tabular}{c l}

@@ 857,10 861,309 @@ Table~\ref{tab:list-labeling} shows the best known updates for different sizes o

\topics{Orthogonal Range Search, Range Trees, Layered Range Trees, Fractional Cascading}

We look at problems involving geometry, for instance queries in 2D space:
given a set of points, which points are in an axis aligned rectangle?

In general, geometric data strucutres is all about data in higher dimensions.
We differentiate between static structures and dynamic structures.

\section{Line Sweep}
We look at the problem of maintaining line-segments in 2D\@; we would like to store the order of the lines and the intersections.
In Chapter~\ref{ch:time} we looked at time traveling data structures.
We can use these to shave off a single dimension on geometry problems by pretending that one axis is time.

By walking along the x-axis we can maintain a BBST with the points.
On each time $t$ where something happends, that is either a segment is started, ended, or a crossing occur, we can
translate this to an operation in the BBST\@.
For the static version we need a persistent BBST\@.
This allows queries to be done in a specific time, which is our $x$ coordinate.
Now if we would like to know which line is above a point $(x, y)$, we can translate the $x$ coordinate to a time $t$,
and query for the successor of $y$ in the BBST at time $t$.
We get $O(\log n)$ query after $O(n \log n)$ preprocessing (building the persistent BBST).

The dynamic version is similar, but we need a retroacitve BBST, since we need to insert segment begin and segmen end events.

\section{Orthogonal Range Search}%
In this problem we want to maintain $n$ points in $d$ dimentional space,
and answer queries where we ask for the points in a $d$ dimentional hypercube $[a_1, b_1] \times [a_2, b_2] \times \cdots \times [a_d, b_d]$.
We want existence, count, or the actual points.
The time bound we are aiming for initially is $O(\log^d n + k)$ where $k$ is the number of points we return (trivial bound).
Again we differentiate between the dynamic and the static version.

We store the points in a BBST where all the points are leaves (this only doubles the space).
For a range search $[a, b]$ we find
the nodes $a' = pred(a)$, $b' = succ(b)$, and \code{LCA}{a', b'}, and report the points in the subtrees in between $a'$ and $b'$.
Since both $pred$ and $succ$ are $O(\log n)$ and the size of the subtree between $a'$ and $b'$ is $O(k)$ we get queries in $O(\log n + k)$ time.

We store a 1-dimentional tree on the $x$ coordinate of all points, similar to in the $d=1$ case.
This time however, we augment the nodes of the tree with a new tree, containing the same points, but sorted on $y$.
That is, any node is itself the root of a tree sorted on $x$ which containts the points $P$ as leaves.
The node also points to a new tree which stores $P$ on $y$.
The y-trees are independent, and have no cross links.

We note that each point is in $O(\log n)$ trees: the main tree, and one for each ancestor node, of which there are $O(\log n)$.
On a query, we find the subtrees of the main tree containing all points with $x$ coordinates in the range.
Then we go through all y-trees and do a range search on $y$.
This gives us $O(\log^2 n + k)$ query time.

The space requirement is only $O(n \log n)$, and construction time is also $O(n \log n)$.
Observe that the space recurrence is $S(n) = 2S(n/2) + O(n)$, the same as the time recurrence for \algo{Merge-Sort}.

The approach taken in the $d=2$ case generalizes to any dimension.
We end up with $O(\log^D n + k)$ query, $\Theta(n \log^{D+1} n)$ space and construction,
and $\Theta(\log^D n)$ update (in the dynamic setting).

\section{Layered Range Tree}%
We observe that the approach taken in the previous section is wasteful: when $d=2$ we search for the same y-intervals in $O(\log n)$ trees.
We want to take advantage of this by reusing the searches.

Instead of having the nodes in the x-tree store another tree, this time they only point to a sorted array on y.
The idea is that we only want to do a single search on y, which will be in the root node (the array containing all points).
Now, when we walk down from the root to \code{LCA}{a', b'} (the pred.\ and succ.) we can follow pointers into the child array,
so that we know at once where we are.
Now when we get to the subtrees we want to output we have the y-interval of points that we are interested in,
and since the subtree is completely covered on $x$, these are exactly the points we are looking for,
so we get this search ``for free''.

Note that we depend on the fact that a child node has a subset of the point that the parent has.

We start out with querying the points in the $y$ array which takes $O(\log n)$ time, and then we walk down to the leaves,
which is a walk of length $O(\log n)$.
On each step we output points, of which there are $k$ in total.
Hence we end up with $O(\log n + \log n + k) = O(\log n + k)$ queries ($O(\log^{d-1} n)$ in general).
The space is the same as previously: $O(n \log n)$ ($O(n \log^{d-1} n)$ in general).
The construction time is also the same, since the pointer setup can be done in linear time, so we get the same reccurence as \algo{Merge-Sort}, yet again.

Unfortinately, this does not generalize to higher dimensions: we can only shave off one $\log$ factor using this approach.

\section{Weight-Balance trees}
We would like to use range trees in a dynamic setting.
The tree we look at is the \bba{} tree.
A weight-balanced tree is similar to a \emph{height} balance tree, which we know: AVL trees and Red-Black trees are examples of height-balanced trees.
With weight-balanced trees we would naturally like to balance the weight --- the number of nodes --- of the subtrees instead of the height.

More formally, we have the following invariant:
    \forall x\ &size(left(x)) \geq \alpha\ size(x)\\
               &size(right(x)) \geq \alpha\ size(x)\quad\text{where }\alpha \in {[0, 1/2]}

A curious property of this invariant is that it implies height balance: $h \leq \log_{\frac{1}{1 - \alpha}} n$

On update we simply insert at the leaves, and update the weights upward in the tree,
assuming all internal nodes store the weights of its children explicitly.
When a node becomes unbalanced, we simply rebuild the entire subtree from scratch.
While this might seem slow, we can use an amortization scheme where we charge the $\Theta(k)$ updates in a subtree for that
subtrees rebuild time, since we need a lot of changes in a subtree before the root of that subtree becomes unbalanced.
The details are a little messy, but the bottom line is we get $O(\log n)$ amortized update.

We can apply this to the range tree from Section~\ref{sec:range-tree} to get $O(\log^d n)$ amortized updates.

We would also like to use the layered approach from Section~\ref{sec:layered-range-tree} to shave off a $\log$ factor,
but it turns out that array rebuilding is problematic.
However, we only need something array like in the root node of the tree, since we
only need a binary search there, and we never use random access for the arrays in the internal nodes as we only follow pointers.
We can replace the root array with a BBST, and the internal array with linked lists.
We end up with the same query time $O(\log^{d-1} n + k)$, since the procedure is exactly the same, but also get $O(\log^d n)$ updates.

\chapter{Connectivity in Dynamic Graphs}
\topics{Dynamic connectivity on trees, Euler tour trees}.

Before starting, we point out that this chapter is subject to fewer proofs, and more stated results.

We would like to solve the problem of connectivity queries.
We maintain a graph which are subject to updates (edge insertion and deletion), and we answer queries of the form ``is $u$ and $v$ connected''?
As in previous sections we split the problem into two variants: fully dynamic and partially dynamic.

\begin{definition}{Fully Dynamic}
    Connectivity queries in which the graph is fully dynamic

\begin{definition}{Partially Dynamic}
    Connectivity queries in which the graph update can be either edge insertions \emph{or} edge deletions, but not both.
    Only insertions is called \emph{incremental}, and only deletion is called \emph{decremental}.

Unless specified, we consider fully dynamic connectivity.

\section{General Results}

We can handle connectivity queries for trees in $O(\log n)$ time, by using Link-Cut trees or Euler-Tour Trees (Section~\ref{sec:euler-tour}).
If we limit ourselves to decremental connectivity, constant time is possible.

\subsubsection{Plane Graphs}
A plane graph is a planar graph with a fixed embedding; that is, edges know which faces they divide, and updates specify the face of the inserted element.
Similar to with trees, $O(\log n)$ is also possible.

\subsubsection{General Graphs}
Is $O(\log n)$ per operation possible?
This is an open problem, but we know how to get $O(\log^2)$ (amortized) update, and $O(\frac{\log n}{\log\log n})$ query.
If we are willing to get slower updates for faster queries, $O(\sqrt{n})$ update and $O(1)$ query is possible.

For the incremental case, we can get $\Theta(\alpha(a, b))$, where $\alpha$ is the inverse Ackermann function, by using \algo{Union-Find}.

Decremental is possible in $O(m \log n + n \text{ polylog } n + \text{\# queries})$, where $m$ is the number of edges and $n$ is the number of vertices.

There is also a known fundamental lower bound: either update or query have to be $\Omega(\log n)$.

\section{Euler-Tour Trees}%
We now look at the specifics regarding the result on tree connectivity, namely that $O(\log n)$ per operation is possible.
We have already seen Euler Tour trees, in Section~\ref{sec:first-euler-tour}.
The general idea is to traverse the tree and write down a node every time we get to it.
Then we build a BBST of the written down nodes, where the ordering is the order in the list.
Each node in the tree store the first and last visit in the BBST\@.
The Euler Tree supports the following operations:

    \item \algo{Make-Tree}: Make a new isolated tree
    \item \code{Find-Root}{v}: find the root of $v$s tree
    \item \code{Link}{u, v}: Attach $u$ as a child of $v$
    \item \code{Cut}{v}: remove $v$ from its parent

We look at how each operation is implemented.
Before we proceed, we remind ourselved of some of the operations that a BBST supports in $O(\log n)$ time:
\code{Split}{x}: turn the tree into two trees, one in which have the keys $< x$ and the other have the keys $> x$;
\code{Concat}{x, y}: turn the two trees $x$ and $y$ where $\forall x_i,y_i\ x_i < y_i$ into one tree with the keys $x \cup y$.
Both operations can be done in $O(\log n)$ time.

This is trivial: the tree for a singleton is the singleton itself.

Note that the root of the tree is not the root of the BBST\@.
We start in the first tour visit of $v$, walk up to the root, and down to the rightmost node in the tree.
The rightmost node is the first visited node, which is the root of the \emph{actual} tree in which we want to find the root.
This takes $O(\log n)$ time.

We find the last occurence of $v$ in the BBST, and insert the tree of $u$ in there.
We also need to make sure that $u$ and $v$ themselves are occuring as they should after concatinating in $u$s tree.
A single split and two concats.

Note that $u$ have to be the root of its tree. What do we do if it is not?
We can \emph{reroot} the tree: pick up the node we want to be the new root, such that the remaining of the tree ``falls down''.
This is a cyclic shift in the euler tour, and can be done in one split and one concat,
by splitting at the first occurence of $v$ in the tour, and concating it to the end.

We find the first and last occurence of $v$ in the tree, and cut at those two places, since $v$s subtree
is a contiguous interval in the euler tour.

Then we concat the first and last part together, and remove one of the $parent(v)$ nodes, so there are not two in a row.
Two splits and one concat.

Since all operations consists of walking up or down, splitting or concating, which all takes $O(\log n)$ time, we get $O(\log n)$ for all operations.
Connectivity queries can be done by comparing the roots of the nodes we are querying.

\section{Fully Dynamic Graphs}
We look at how to obtain $O(\log^2 n)$ amortized queries for fully dynamic graphs.
We maintain a spanning forest of the graph, using Euler-Tour trees.
Now edge insertion corresponds to \algo{Link}.
Edge deletion have two cases: if the edge deleted is not in the spanning forest we maintain, nothing has changed;
If it \emph{is} we run into trouble, since simply deleting the edge does not imply that the graph becomes disconnected:
there might be another edge that we did not use in the spanning tree, because the two components were already connected by
the edge that we are now deleting.
If we know that no such edge exist, we can simply \algo{Cut} out the tree, and we are done.

The way we do this is to assign \emph{levels} to edges, and store $O(\log n)$ levels spanning forests, where some edges
may get lost when going a level down.
All edges start at level $\log n$, and the level is monotoically decresing, and at least 1.

Let $G_i$ be the subgraph of edges with level $\leq i$.
Note that $G_{\log n} = G$.

Let $F_i$ be the spanning forest of $G_i$, stored using Euler-Tour trees.
Note that $F_{\log n}$ answers the connectivity queries in $G$, since the forest spans the entire graph,
and support connectivity queries in $O(\log n)$ time.

We maintain the following invariants:

\subsubsection{Invariant 1}
Every connected component of $G_i$ has $\leq 2^i$ vertices.

\subsubsection{Invariant 2}
The forests nest: $F_0 \subseteq F_1 \subseteq \cdots \subseteq F_{\log n}$,
and are gived by $F_i = F_{\log n} \cap G_i$.
There is only one forest, and $F_i$ is just the part of the forest with the lower levels.
This also means that $F_{\log n}$ is a minimal spanning forest with respect to edge levels.

On insertion of $e=(u, v)$ we set $e.level = \log n$, and add $e$ to $u$ and $v$s indicence lists.
If $(u, v)$ are not connected we add $e$ to $F_{\log n}$.
This makes for $O(\log n)$ insertion.

This is the hard part.
We start by removing $e$ from the indicence lists of the veritces it is connected to.
This can be done in constant time if the edge itself stores a pointer into where it is in those lists.
Then we check if $e \in F_{\log n}$ we are done (if $e$ is in any forest it is in $F_{\log n}$, since they nest).
Else, we have to delete $e$ from $F_{e.level}\dots F_{\log n}$, which is exactly the trees that $e$ lives in.
All of these are Euler-Tour trees, and there are at most $O(\log n)$ of them, which makes a total cost of $O(\log^2 n)$.

Now we have to look for a replacement edge. We know by invariant 2 that there are no edges with a lower level,
since then that edge would be in the tree istead of $e$.
So if there is a replacement edge, it has level $\geq e.level$.
We search upwards from $e.level$ to $\log n$.
For each level $i$ we let $T_u$ and $T_v$ be the trees of $F_i$ containing $u$ and $v$.
Without loss of generality, let $|T_u| \leq |T_v|$.
By invariant 1, we know that the sizes of these components are limited: $|T_u| + |T_v| \leq 2^i$, since they were connected before deleting $e$.
This means that $T_u \leq 2^{i-1}$, so we \emph{can} push down all edges in $T_u$ to level $i-1$ without destroying invariant 1.
We will use this as the charging scheme to get the amortized running time we want.

We look at all edges $e'=(x,y), x \in T_u$ at level $i$.
The edge is either internal to $T_u$, or it goes to $T_v$, like $e$ does.
Why can it not go to another component $T_w$?
Assume there is an edge $f=(x, w),\ w \in T_w$ of level $i$.
Since $f.level = i$ we know that $f \in G_i$, and since $F_{\log n}$ is a minimal spanning forest,
we know that if $T_u$ and $T_w$ are connected in $G$ they are connected in $G_i$, since $f$ can be used.
But this contradicts the assumption, namely that $f$ is not internal to $T_u$.
Therefore $T_u$ and $T_w$ cannot be connected in $G$, so $f$ cannot exist.

If $e'$ is internal to $T_u$ it does not help us, so we set $e'.level = i - 1$, which we can afford.
If $e'$ goes to $T_v$ we are done, since it is a replacement edge; insert it into $F_i, \dots, F_{\log n}$.

Overall we pay $O(\log^2 n + \log n \cdot \text{\# level decreses})$,
but the number of level decreses is bounded by the number of inserts times $\log n$,
since edge levels are strictly decresing and between $\log n$ and 1.
We can charge inserts with $\log n$, making the amortized cost of delete $O(\log^2 n)$, which is what we wanted.

The last complication is that we need to augment the tree with subtree sizes at every node, in order to make the comparison $T_u \leq T_v$ in constant time,
and that we somehow must find all edges on a certain level.
To handle this we store in all internal nodes in the Euler-Tour trees an array, signaling whether the nodes in this subtree
has any level $i$ edges adjacent to them.
In addition, the adjacency lists of the nodes store one list for each level, instead of having one list for all edges.
This makes the search to find the next level $i$ edge $O(\log n)$.

\section{Other Results}
We list some other related results in the field of conenctivity.

2-edge connectivity is maintainable in $O(\log^4 n)$ time, and 2-vertex connectivity in $O(\log^5 n)$ time.

\subsection{Minimum Spanning Forest}
The general problem of maintaining a minimum spanning forest can be solved dynamically in $O(\log^4 n)$ time.

Dynamic connectivity on trees, euler tour trees.

\chapter{Lower Bounds}