328350f00edcc5df4715ca9dc1a8e643aef33b90 — Martin Hafskjold Thoresen 5 years ago 223f923
Begin on Temporal structures
1 files changed, 257 insertions(+), 2 deletions(-)

M book.tex
M book.tex => book.tex +257 -2
@@ 7,6 7,7 @@

@@ 602,8 603,262 @@ Linear time!

\chapter{Temporal Structures}

Partial persistency, full persistency, funcitonal persistency,
pareial retroactivity, full retroactivity.
\topics{Partial persistency, full persistency, funcitonal persistency, pareial retroactivity, full retroactivity.}

When working with temporal data structures, our machine model is the \emph{Pointer Machine}

\begin{definition}{Pointer Machine}
    A computational model in which we have \emph{Nodes} with $O(1)$ fields.
    The fields can be values or \emph{Pointers}, which points to other nodes.
    Field access, node creation, data operations are all $O(1)$ time.
    The pointer machine does not have arrays.

A persistent data structure is a data strucure that ``never changes''.
On a mutable operation to the structure, a new structure is made.
This means that all operations done to the data strucure is relative to a specific version of the structure.

There are four levels of persistence:
    \item Partial persistence: Only the latest version is editable. The versions are linearly ordered through time.
    \item Full persistence: Any version can be updated, to produce a new version. The versions makes a tree.
    \item Confluent persistence: Versions can be merged to produce a new version. The graph is now a DAG\@.
    \item Functional: Nodes in the data structur are never modified, only new ones are created. The version of the structure is maintained by a pointer.

\subsection{Partial Persistence}
Any data structure using the pointer machine with $\leq p = O(1)$ pointers to any node in any version can be made partially persistent,
with $O(1)$ amortized multiplicative overhead and $O(1)$ space per change.
We show how this can be done.

Each node store \emph{Back Pointers}, which points to all nodes that have a pointer to this node.
Since there are at most $p$ such pointers, this is $O(1)$ extra space.
In addition, the nodes store a history of modifications $(version, field, value)$ tuples.
The history is of size $\leq 2p$ (using the fact that $p=O(1)$).
On field reads we read the value explicitly stored in the node, and then apply all the modifications that were done up to the verison
we are reading from.
When the history is full, we create a new node with the entire modification history applied to its fields, and an empty history.
Now we need to update the back pointers of the nodes we point to, since these nodes also store back pointers.
This is easy, since the back pointers are not persistent.
But we also need to update the pointers (not back pointers) of other nodes (which is the nodes we have in our back pointer list),
and these pointers are persistent, so we must recurse.

It is not trivial to see that this recursion terminates.
What if the nodes we update get a full history from updating their pointer to us,
which makes us update our pointers making our history full again?

We can use the \emph{Potential Method} for amortized analysis.
Remember that \emph{Amortized cost} is actual cost + change in potential.
Let $\Phi = c \cdot \Sigma$, where $\Sigma$ is the number of modifications in all nodes latest versions.
Observe that when we make a change, the amortized cost is
$$\leq c + c\ [- 2cp + p \cdot \text{recursions}]\text{?}$$
The first $c$ is for computation, the second $c$ is the added potential, since we
just added one modification to a node,
and $[-2cp + p \cdot \text{recursions}]\text{?}$ are for the recursions, if there is one.
If there is no recursion, we have constant time.
If there is a recursion, we replace it by the same expression, which causes the $-2cp$ to cancel.
We do end up with another recursion, but if it does happen, we still get rid of the $-2cp$ term.

In conclusion, we use $O(3p) = O(1)$ extra space for each node, and allow $O(1)$ extra time on updates, to
get partial persistence for any data structure.

\subsection{Full Persistence}
We take a similar approach to full persistency as we did with partial persistency.
The first difference is that we need \emph{all} pointers to be bi-directional, and not only the field pointers, as previously.
The second and most important difference is that now versions of the structure is no longer a line, but a tree.
In order to go around this problem we linearize the version tree:
traverse the tree, and write out first and last visit of each node\todo{add figure}.

However, we need this structure to be dynamic,
so we use an \emph{Order Maintenance data structure},
which supports insertion before or after a given item in $O(1)$ (like a linked list),
and supports relative order query of two items in $O(1)$ time.

Let $p$ still be the in-degree of nodes.
Let $d$ be the maximum number of fields in a node.
We now allow $2(d + p + 1)$ modifications in a nodes history.

A single nodes history now consists of the same triples, but the $version$ can no longer be compared by $\leq$,
since the versions are in a tree. This is what we need the $O(1)$ relative ordering for.

On field read we can simply go through all modifications in the node (since there is a constant of them),
and use relative ordering to find the latest update of the field we are reading, from an ancestor of the current version.

Updates when the history is full is different from the approach taken in partial persistence.
We would like to split the history tree into two parts of the same size,
and apply all of the modifications from the root to the newly cut out subtree to the new node.
This was the original node loses the modifications in the new subtree, and the new node loses
the modifications that are left in the original node.
Now we need to update pointers.
We have $d$ fields, $p$ back pointers, and $d + p + 1$ items in the history, all of which could be pointers,
which makes $\leq 2d + 2p + 1$ pointers.
Amortization scheme still works out (although the potential function is slightly different), and we get $O(1)$ updates.

Confluent persistency and functional were not covered in class.

\section{Temporal Data Structures}

Temporal data structures allow all operations to specify a time in which the operation is to be performed.
The key difference between temporal and persistent structures is that in a temporal structure previous alterations to the structure
should ``propagate'' through time, and update the current structure.
A persistent data structure would not update the current strucure, but simply make a new data structure for each of the paths taken through time.

We need three operations:
    \item \code{Insert}{t, $\ op(\dots)$}: retroactively perform the specified operation at time $t$.
    \item \code{Delete}{t}: retroactively undo the operation at time $t$.
    \item \code{Query}{t, $\ op(\dots)$}: execute the query at time $t$.

We differentiate between \emph{Partial retroactivity}, in which we are only allowed to \algo{Query} in the present,
and \emph{Full retroactivity}, in which we may query whenever we want.

\subsubsection{A Simple Case}
If the updates we perform are commutative ($x, y = y, x$) and invertible ($x, x^{-1} = \varnothing$)
then partial retroactivity is easy:
Insert operations can be done in the present, since the ordering does not matter, and
delete operations can be done as inverse insert operations.

An example is if we were to sum a sequence of numbers.
If we sum $1 + 3$ and would like to insert a $+2$ in between, we can put it at the end.
If we want to delete the $+3$, we can add a $-3$ at the end.

\subsection{Full Retroactivity}
What do we need for full retroactivity?

\begin{definition}{Decomposable Search Problem}
    $\text{query}(x, A\cup B) = f(\text{query}(x, A), \text{query}(x, B))$

    \algo{Nearest-Neighbour}, \algo{Successor}, and \algo{Point-Location} are all decomposable search problems.

If what we query is a DSP, we can achieve full retroactivity in $O(\log m)$ factor overhead,
where $m$ is the number of retroactive operations.
We can do this by using a \emph{Segment Tree} over the operations.
We think of the operations as begin in a time interval, from they are made until they are deleted.
Then we can store the operations in the nodes of the maximal subtrees that covers the interval.
Each node in the segment tree has its own copy of the underlying datastructure which we are making retroactive.
Since there are $O(\log m)$ nodes an element can be in, and the individual results can be joined \emph{somehow} (via $f$, since the problem is a DSP),
we get $O(\log m)$  multiplicative overhead.

\subsection{General Transformations}
What can we do if we do in general?
The most obvious method is the \emph{Rollback Method}:
when we want to do a retroactive change, we roll back the data structure to the given time,
make the change, and replay all changes afterwards.
This makes for a $O(r)$ multiplicative overhead, where $r$ is the number of time units in the past.
Unfortunately, depending on the exact model used, this is the best we can do in general.

There is a lower bound: $\Omega(r)$ overhead can be necessary for retroactivity.
This means that the most efficient way to make retroactive changes is to go back, make the change,
and redo whatever comes after --- the rollback method!

To see why this is the case, consider a very simply computer with two registers $X$ and $Y$, and with the following operations:
$X=x$, $Y\mathrel{+}=\Delta$, and $Y = XY$.
On query, the machine returns $Y$, and all operations are $O(1)$.
The operation sequence
$$\big<Y \mathrel{+}= a_n,\ Y=XY,\ Y\mathrel{+}= a_{n-1},\ \dots,\ Y\mathrel{+}= a_0\big>$$
computes the polynomial $\sum^{n}_{i} a_{i}x^{i}$.
We can use this to compute the polynomial for any $X$ by retroactively inserting $X=x$ at $t=0$.
However, it is not possible to reeveluate such a polynomial using field operations any faster than to just evaluate it again.

\subsection{Priority Queue}
We now look at an example of a retroactive priority queue that supports
\algo{Insert}, \algo{Delete-Min}, and is partially retroactive.
We assume keys are only inserted once.

We can plot the lifetime of the queue in 2D, where the x dimension is time and the y dimensions in key value.
\todo{add plot}
Keys in the queue are plotted as points when they are inserted, and are extended as horizonal rays.
On \algo{Delete-Min}, we shoot a ray from the x-axis at the time of the delete upward untill it hits a horizontal ray.
This makes $\rceil$ patterns.

Let $Q_{t}$ be the set of keys in the queue at time $t$.
What happens if we \code{Insert}{t, \text{insert}$(k)$}?
Its ray will extend to the right and hit a deletion ray, which will delete it.
This will make the element that ray previously deleted to not be deleted after all,
so its ray will extend further to the right and hit another deletion ray.
In effect, the element that ends ut not getting deleted is
$\max\{k, k' \mid k' \text{\ deleted at time} \geq t\}$, but this relation is hard to maintain.

In order to make things easier we define a \emph{Brige}.

    A Bridge is a time $t$ such that $Q_t \in Q_{\text{now}}$

Now if $t'$ is the bridge preceding $t$ we see that
$$\max\{k'\mid k' \text{\ deleted at time}\geq t\} = \max\{k'\notin Q_{\text{now}} \mid k' \text{\ inserted at time}\geq t'\}$$
In other terms, the largest key deleted after a time $t$ is the largest key inserted after the previous bridge that is not in the final set.

Now we can store $Q_{\text{now}}$ as a BBST on values, and all insertions in a BBST on time.
The latter trees nodes is also augmented with the value of the largest insert in its subtree that is not in $Q_{\text{now}}$.
At last, we store a third BBST with \emph{all} the updates, ordered on time, and also augmented as follows:
\text{Augmentation} =
    0 \text{\ if \code{insert}{k}, and\ }k \in Q_\text{now}\\
    +1 \text{\ if \code{insert}{k}, and\ }k \notin Q_\text{now}\\
    -1 \text{\ if \algo{Delete-Min}}\\
In addition the internal nodes store subtree sums and min/max prefix sums.
We do this in order to detect brdiges, since a bridge is a prefix summing to 0.
When we have to find out which element to insert into $Q_{\text{now}}$ we can walk up from the node in the insertion BST,
and find the max to the right, since this BST stores this for all subtrees.
$O(\log n)$ time.

\subsection{Other Structures}
We list other strucutes that can be made retroactive.
A Queue can be made partial retroactive with $O(1)$ overhead and full retroactive with $O(\log m)$ overhead.
A Deque and \algo{Union-Find} can also be make fully retroactive with $O(\log m)$ overhead.
The priority queue, which we just made partially retroactive with $O(\log m)$ overhead,
can be made fully retroactive with $O(\sqrt{m} \log m)$ overhead.
Successor queries can be done in $O(\log m)$ partial retroactive,
and since it is a decomposable search problem (see~\ref{sec:dsp}) we can pay a $\log$ factor to make it fully retroactive,
with $O(\log^2 m)$ overhead. However, it is also possible to get full retroactivity with only $O(\log m)$ overhead.

\section{List Order Maintenance}
Before tackeling the real problem, we look at an easier problem.

\subsection{List Labeling}
We want to store integer labels in a list, such that the list, such that insert/delete here queries are constant,
and that the list is in a strictly monotone ordering.
\emph{Label Space} is the size of the labels as a function of the number of elements in the list we want to store.
Table~\ref{tab:list-labeling} shows the best known updates for different sizes of the label space.

    \begin{tabular}{c l}
        Label Space & Best known update\\\midrule
        $(1+\epsilon)n\dots n \log n$ & $O(\log^2 n)$ \\
        $n^{1 + \epsilon} \dots n^{O(1)}$ & $O(\log n)$ \\
        $2^n$ & $O(1)$
        Table showing label space vs best known update for \algo{List-Labeling}}


\chapter{Connectivity in Dynamic Graphs}