\ifx\chapter\undefined
\input adsmac.tex
\singlechapter{51}
\fi

\chapter[dynamic]{Dynamization}

A data structure can be, depending on what operations are supported:

\tightlist{o}
\: {\I static} if all operations after building the structure do not alter the
data \foot{As a side note regarding this terminology, let us remark about the distinction 
between an update of the proper data stored inside a data structure and an update of some
auxiliary data. For example a splay tree can change shape even though only queries and no
updates happen.},
\: {\I semi-dynamic} if data insertion is possible as an operation,
\: {\I (fully) dynamic} if deletion of data is allowed on top of insertion.
\endlist

Static data structures are often sufficient for many applications where updates are simply not required. 
A sorted array is a typical example of a static data structure to store an
ordered set of $n$ elements. Its supported operations are $\alg{Index}(i)$
which simply returns $i$-th smallest element in constant time, and
$\alg{Find}(x)$ which finds $x$ and its index $i$ in the array using binary
search in time $\O(\log n)$.

However, if we wish to insert a new element to an already existing sorted array,
this operation will take $\Omega(n)$ -- we must shift the elements to keep
the sorted order. In order to have a fast insertion, we may decide to use a
different dynamic data structure, a binary search tree (BST) for instance. In which case
the operation \alg{Index} slows down to logarithmic time.

What happened to \alg{Index} is a frequent inconvenience when we modify data structures to support updates. 
Oftentimes making one operation run faster is only possible by making another operation 
run slower. One must therefore strike a careful balance among complexities of 
individual operations based on how often they are needed. It is the subject 
of this chapter to show some efficient techniques of {\I dynamization} --
transformation of a static data structure into a (semi)dynamic data structure.

\section{Global rebuilding}

Consider a data structure with $n$ elements such that modifying it may cause
severe problems that are too hard to fix easily. In that case, we give up on
fixing it and rebuild it completely anew.

If initializing the structure with $n$ elements takes $f(n)$ time steps and 
we perform the rebuild after $\Theta(n)$ modifying operations, we can amortize 
the cost of rebuild into the operations. This adds an amortized factor 
$\O(f(n)/n)$ to their time complexity, given that $n$ does not change 
asymptotically between the rebuilds.

\examples

\list{o}
\:
An array is a structure with limited capacity $c$. While it is dynamic (we can
insert or remove elements at the end), we cannot insert new elements
indefinitely. Once we run out of space, we build a new structure with capacity
$2c$ and copy to it the elements from the old structure.
Since we inserted at least $\Theta(n)$ elements to reach the limit from a freshly
rebuilt structure, this amortizes to $\O(1)$ time per insertion,
as we can rebuild an array in time $\O(n)$.

\:
Another example of such a structure is a $y$-fast trie. It is parametrized by
block size required to be $\Theta(\log n)$ for good time complexity. If we let
$n$ change enough such that $\log n$ changes asymptotically, the proven time
complexity no longer holds.
We can save this by rebuilding the trie once $n$
changes by a constant factor (then $\log n$ changes by a constant additively).
This happens no sooner than after $\Theta(n)$ insertions or deletions.

\:
Consider a data structure where instead of proper removal of elements on deletion
we just replace them with ``tombstones''. When we run a query later, we just ignore tombstones.
After enough deletions, most of the structure becomes filled with tombstones, leaving
too little space for proper elements and slowing down the queries.
Once again, the idea is simple -- once at least $n/2$ of elements are tombstones, we rebuild
the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$
elements.
\endlist

\subsection{Local rebuilding}

In many cases, it is enough to rebuild just a part of the structure to fix
local problems. If a segment of the structure has size $k$, we want to have
space out reconstructions at least $\Theta(k)$ operations apart, allowing it 
to amortize into other operations.

One of such structures is a BST. Imagine starting with a perfectly
balanced tree, inserting and removing nodes, the tree structure degrades over
time. With a particular choice of operations, we can force the tree to
degenerate into a long vine, having linear depth.

We introduced weight-balanced trees that maintain balance using an algorithm based on this idea 
of partial reconstruction in the very first chapter of these lecture notes.

\section{General semi-dynamization}

Let us have a static data structure $S$. We do not need to know how the data
structure is implemented internally. We would like to use $S$ as a ``black
box'' to build a (semi-)dynamic data structure $D$ which supports queries of $S$
but also allows element insertion.

This is not always possible, the data structure needs to support a specific
type of queries answering {\I decomposable search problems}.

\defn{
A {\I search problem} is a mapping $f: U_Q \times 2^{U_X} \to U_R$ where $U_Q$
is a universe of queries, $U_X$ is a universe of elements and $U_R$ is set of
possible answers.
}

\defn{
A search problem is {\I decomposable}, if there exists an operator $\sqcup: U_R
\times U_R$ computable in time $\O(1)$\foot{
Most practical decomposable problems do meet this condition.
If it is not met, the construction will still work correctly, 
but the time complexity may increase.}
such that $\forall A, B \subseteq U_X$, $A \cap B = \emptyset$ and $\forall q
\in U_Q$: $$ f(q, A \cup B) = f(q, A) \sqcup f(q, B).$$
}

\examples

\list{o}
\: Let $X \subseteq {\cal U}$. Is $q \in X$? This is a classic search problem
where universes $U_Q, U_X$ are both set ${\cal U}$ and possible replies are
$U_R = \{\hbox{true}, \hbox{false}\}$. This search problem is decomposable, the
operator $\sqcup$ is a simple binary \alg{or}.

\: Let $X$ be set of points on a plane. For a point $q$, what is the distance
of $q$ and the point $x \in X$ closest to $q$? This is a search problem where
$U_Q = U_X = \R^2$ and $U_R$ is the set of non-negative reals. It is also decomposable -- $\sqcup$
returns the minimum.

\: Let $X$ be set of points of a plane. Is $q$ in convex hull of $X$? This
search problem is not decomposable -- it is enough to choose $X = \{a, b\}$ and
$q \notin X$. If $A = \{a\}$ and $B = \{b\}$, both subqueries answer
negatively. However, the query answer is equivalent to whether $q$ is a convex
combination of $a$ and $b$.
\endlist

For a decomposable search problem $f$ we can thus split (decompose) any query
into two queries on disjoint element subsets, compute results on them
separately and then combine them in constant time to the final result. We can
further chain the decomposition on each subset, allowing to decompose the query
into an arbitrary amount of subsets.

We can therefore use multiple data structures $S$ as blocks, and to answer a
query we simply query all blocks, and then combine their answers using
$\sqcup$. We will show this construction in detail.

\subsection{Construction}

First, let us denote a few parameters for the static and dynamic data
structure.

\nota{For a data structure $S$ containing $n$ elements and answering a
decomposable search problem $f$ and the resulting dynamic data structure $D$:}

\tightlist{o}
\: $B_S(n)$ is time complexity of building $S$,
\: $Q_S(n)$ is time complexity of query on $S$,
\: $S_S(n)$ is the space complexity of $S$,
\medskip
\: $Q_D(n)$ is time complexity of query on $D$,
\: $S_D(n)$ is the space complexity of $D$,
\: $\bar I_D(n)$ is {\I amortized} time complexity of insertion to $D$.
\endlist

We assume that $Q_S(n)$, $B_S(n)/n$, $S_S(n)/n$ are all non-decreasing functions.

We cover the set $X$ by pair-wise disjoint blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$.
Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its
binary representation uniquely determines the block structure. Thus, the total
number of blocks is at most $\log n$.

For each nonempty block $B_i$ we build a static structure $S$ of size $2^i$.
Since $f$ is decomposable, a query on the structure will run queries on each
block, and then combine them using $\sqcup$:
$$ f(q, x) = f(q, B_0) \sqcup f(q, B_1) \sqcup \dots \sqcup f(q, B_i).$$

\lemma{$Q_D(n) \in \O(Q_S(n) \cdot \log n)$.}

\proof
Let $|X| = n$. Then the block structure is determined and $\sqcup$ takes
constant time, $Q_D(n) = \sum_{i: B_i \neq \emptyset} \left(Q_S(2^i) + \O(1)\right)$. Since $Q_S(x)
\leq Q_S(n)$ for all $x \leq n$, the inequality holds.
\qed

\lemma{$S_D(n) \in \O(S_S(n))$.}

\proof
For $|X| = n$ let $I = \{i \mid B_i \neq \emptyset\}$. Then for each $i \in I$
we store a static data structure $S$ with $2^i$ elements contained in this
block. Therefore, $Q_D(n) = \sum_{i \in I} Q_S(2^i)$. Since $S_S(n)$ is
assumed to be non-decreasing,
$$
	\sum_{i \in I} Q_S(2^i)
	\leq \sum_{i \in I} {Q_S(2^i) \over 2^i} \cdot 2^i
	\leq {S_S(n) \over n} \cdot \sum_{i=0}^{\log n} 2^i
	\leq {S_S(n) \over n} \cdot n.
$$
\qedmath

It might be advantageous to store the elements in each block separately so that
we do not have to inspect the static structure and extract the elements from
it, which may take additional time.

An insertion of $x$ will act like an addition of 1 to a binary number. Let $i$
be the smallest index such that $B_i = \emptyset$. We create a new block $B_i$
with elements $B_0 \cup B_1 \cup \dots \cup B_{i-1} \cup \{x\}$. This new block
has $1 + \sum_{j=0}^{i-1} 2^j = 2^i$ elements, which is the required size for
$B_i$. At last, we remove all blocks $B_0, \dots, B_{i-1}$ and add $B_i$.

\figure{semidynamic-insert.pdf}{}{Insertion of $x$ in the structure for $n =
23$, blocks $\{x\}$, $B_0$ to $B_2$ merge to a new block $B_3$, block $B_4$ is
unchanged.}

\lemma{$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.}

\proof{
	Since the last creation of $B_i$ there had to be least $2^i$
	insertions. Amortized over one element this cost is $B_S(2^i) / 2^i$.
	As this function is non-decreasing, we can lower bound it by $B_S(n) /
	n$. However, one element can participate in $\log n$ rebuilds during
	the structure life. Therefore, each element needs to store up cost $\log n
	\cdot B_S(n) / n$ to pay off all rebuilds. \qed
}

\theorem{
Let $S$ be a static data structure answering a decomposable search problem $f$.
Then there exists a semi-dynamic data structure $D$ answering $f$ with parameters

\tightlist{o}
\: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$,
\: $S_D(n) \in \O(S_S(n))$,
\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$ amortized.
\endlist
}

In general, the bound for insertion is not tight. If $B_S(n) =
\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated
and $\bar I_D(n) \in \O(n^\varepsilon)$.

\example

If we use a sorted array using binary search to search elements in a static
set, we can use this technique to create a dynamic data structure for general
sets. It will require $\Theta(n)$ space and the query will take $\Theta(\log^2
n)$ time as we need to binary search in each list. Since building requires
sorting the array, building one requires $\Theta(n \log n)$ and insertion thus
costs $\Theta(\log^2 n)$ amortized time.

We can speed up insertion time. Instead of building the list anew, we can merge
the lists in $\Theta(n)$ time, therefore speeding up insertion to $\O(\log n)$
amortized.

\subsection{Worst-case semi-dynamization}

So far we have created a data structure that acts well in the long run, but one
insertion can take a long time. This may be unsuitable for applications where we
require low latency. In such cases, we would like that each insertion is fast
even in the worst case.

Our construction can be deamortized for the price that the resulting
semi-dynamic data structure will be more complicated. We do this by not
constructing the block at once, but decomposing the construction such that on
each operation we do a small amount of work on it until eventually the whole
block is constructed.

However, insertion is not the only operation, we can also ask queries even
during the construction process. Thus we must keep the old structures until the
construction finishes. As a consequence, more than one block of each size may
exist at the same time.

For each rank $i$ let $B_i^0, B_i^1, B_i^2$ be complete blocks participating in
queries. No such block contains a duplicate element and union of all complete
blocks contains the whole set $X$.

Next let $B_i^*$ be a block in construction. Whenever two blocks $B_i^a, B_i^b$
of same rank $i$ meet, we will immediately start building $B_{i+1}^*$ using
elements from $B_i^a \cup B_i^b$.

This construction will require $2^{i+1}$
steps until $B_{i+1}^*$ is finished, allocating enough time for each step. Once
we finish $B_{i+1}^*$, we add it to the structure as one of the three full
blocks and finally remove $B_i^a$ and $B_i^b$.

We will show that, using this scheme, this amount of blocks is enough to
book-keep the structure.

\lemma{
At any point of the structure's life, for each rank $i$, there are at most
three finished blocks and at most one block in construction.
}

\proof
For an empty structure, this certainly holds.

Consider a situation when two blocks $B_i^0$ and $B_i^1$ meet and $B_i^1$ has
just been finalized. Then we start constructing $B_{i+1}^*$.  $2^{i+1}$ steps
later $B_{i+1}$ is added and blocks $B_i^0$, $B_i^1$ are removed.

There may appear a new block $B_i^2$ earlier. However, this can only happen
$2^i$ steps later. For the fourth block $B_i^3$ to appear, another $2^i$ steps
are required. The earliest time is then $2 \cdot 2^i = 2^{i+1}$ steps later,
during which $B_{i+1}^*$ has been already finalized, leaving at most two blocks
together and no block of rank $i+1$ in construction.
\qed

An insertion is now done by simply creating new block $B_0$. Next, we
additionally run one step of construction for each $B_j^*$. There may be up to
$\log n$ blocks in construction.

\theorem{
Let $S$ be a static data structure answering a decomposable problem $f$. Then
there exists semi-dynamic structure with parameters

\tightlist{o}
\: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$,
\: $S_D(n) \in \O(S_S(n))$,
\: $I_D(n) \in \O(B_S(n)/n \cdot \log n)$ worst-case.
\endlist
}

\proof
Since there is now a constant amount of blocks of each rank, the query time and
space complexities have increased by a constant compared to previous
technique.

Each insertion builds a block of size 1 and then runs up to $\log n$
construction steps, each taking $B_S(2^i)/2^i$ time. Summing this
together, we get the required upper bound.
\qed

\subsection{Full dynamization}

For our definition of search problems, it is not easy to delete elements, as
any time we wished to delete an element, we would need to take apart and split a
structure into a few smaller ones. This would never to amortize to a
decent deletion time.

Instead of that, we will want the underlying statics structure to have an
ability to cross out elements. These elements will no longer participate in
queries, but they will count towards the structure size and complexity.

Once we have ability to cross out elements, we can upgrade the semi-dynamic data
structure to support deletion. We add a binary search tree or another set
structure which maps each element to a block it lives in. For each element we
keep a pointer on its instance in the BST. When we build a new block, we can
update all its current elements in the tree in constant time (and insert the
new one in logarithmic time).

Insertion time complexity then will always take at least logarithmic time and
space requirements increase by the BST.

Deletion then finds an element in the BST, locates it in the corresponding
block and crosses it out. We also keep the count of crossed out elements. If
this count becomes a certain fraction of all elements, we rebuild the structure
completely.

Before having to rebuild the whole structure, we cross-out at least $\Theta(n)$
elements, so the deletion time can be amortized and it will result in same time
complexity as insertion.

There also exists an worst-case version of full dynamization which carefully
manipulates with blocks, splitting and merging them as required. The details
are somewhat complicated, so we will not look at them further.

\endchapter