Commit e62e5242 authored by Václav Končický's avatar Václav Končický
Browse files

Dynamization: Addition of a picture and rewrite

parent 3a9cc9de
TOP=..
PICS=semidynamic-insert
include ../Makerules
......@@ -7,46 +7,78 @@
A data structure can be, depending on what operations are supported:
\list{o}
\tightlist{o}
\: {\I static} if all operations after building the structure do not alter the
data,
\: {\I semidynamic} if data insertion is possible as an operation,
\: {\I fully dynamic} if deletion of inserted data is allowed along with insertion.
\endlist
Static data structures are useful if we know the structure beforehand. In many
cases, static data structures are simpler and faster than their dynamic
alternatives.
A sorted array is a typical example of a static data structure to store an
ordered set of $n$ elements. Its supported operations are $\alg{Index}(i)$
which simply returns $i$-th smallest element in constant time, and
$\alg{Find}(x)$ which finds $x$ and its index $i$ in the array using binary
search in time $\O(\log n)$.
However, if we wish to insert a new element to already existing sorted array,
this operation will take $\Omega(n)$ -- we must shift the elements to keep
the sorted order. In order to have a fast insertion, we may decide to use a
different dynamic data structure, such as a binary search tree. But then the
operation \alg{Index} slows down to logarithmic time.
In this chapter we will look at techniques of {\I dynamization} --
transformation of a static data structure into a (semi)dynamic data structure.
As we have seen with a sorted array, the simple and straight-forward attempts
often lead to slow operations. Therefore, we want to dynamize data structures
in such way that the operations stay reasonably fast.
\section{Structure rebuilding}
Consider a data structure with $n$ elements such that modifying it may cause
severe problems that are too hard to fix easily. Therefore, we give up on
fixing it and rebuild it completely anew. If we do this after $\Theta(n)$
operations, we can amortize the cost of rebuild into those operations. Let us
look at such cases.
severe problems that are too hard to fix easily. In such case, we give up on
fixing it and rebuild it completely anew.
If building such structure takes time $\O(f(n))$ and we perform the rebuild
after $\Theta(n)$ modifying operations, we can amortize the cost of rebuild
into the operations. This adds an amortized factor $\O(f(n)/n)$ to
their time complexity, given that $n$ does not change asymptotically between
the rebuilds.
\examples
\list{o}
\:
An array is a structure with limited capacity $c$. While it is dynamic (we can
insert or remove elements from the end), we cannot insert new elements
insert or remove elements at the end), we cannot insert new elements
indefinitely. Once we run out of space, we build a new structure with capacity
$2c$ and elements from the old structure.
Since we insert at least $\Theta(n)$ elements to reach the limit from a freshly
rebuilt structure, this amortizes to $\O(1)$ amortized time per an insertion,
as we can rebuild an array in time $\O(n)$.
Another example of such structure is an $y$-fast tree. It is parametrized by
\:
Another example of such structure is an $y$-fast trie. It is parametrized by
block size required to be $\Theta(\log n)$ for good time complexity. If we let
$n$ change enough such that $\log n$ changes asymptotically, everything breaks.
We can save this by rebuilding the tree before this happens $n$ changes enough,
which happens after $\Omega(n)$ operations.
$n$ change enough such that $\log n$ changes asymptotically, the proven time
complexity no longer holds.
We can save this by rebuilding the trie once $n$
changes by a constant factor (then $\log n$ changes by a constant additively).
This happens no sooner than after $\Theta(n)$ insertions or deletions.
\:
Consider a data structure where instead of proper deletion of elements we just
replace them with ``tombstones''. When we run a query, we ignore them. After
enough deletions, most of the structure becomes filled with tombstones, leaving
too little space for proper elements and slowing down the queries. Once again,
too little space for proper elements and slowing down the queries.
Once again,
the fix is simple -- once at least $n/2$ of elements are tombstones, we rebuild
the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$
elements. If a rebuild takes $\Theta(n)$ time, this again amortizes.
elements.
\endlist
\subsection{Local rebuilding}
......@@ -201,7 +233,7 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:}
\tightlist{o}
\: $B_S(n)$ is time complexity of building $S$,
\: $Q_S(n)$ is time complexity of query on $S$,
\: $S_S(n)$ is the space complexity of $S$.
\: $S_S(n)$ is the space complexity of $S$,
\medskip
\: $Q_D(n)$ is time complexity of query on $D$,
\: $S_D(n)$ is the space complexity of $D$,
......@@ -210,19 +242,16 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:}
We assume that $Q_S(n)$, $B_S(n)/n$, $S_S(n)/n$ are all non-decreasing functions.
We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$
such that $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq
j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$, its binary representation
uniquely determines the block structure. Thus, the total number of blocks is at
most $\log n$.
We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$, $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq
j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its
binary representation uniquely determines the block structure. Thus, the total
number of blocks is at most $\log n$.
For each nonempty block $B_i$ we build a static structure $S$ of size $2^i$.
Since $f$ is decomposable, a query on the structure will run queries on each
block, and then combine them using $\sqcup$:
$$ f(q, x) = f(q, B_0) \sqcup f(q, B_1) \sqcup \dots \sqcup f(q, B_i).$$
TODO image
\lemma{$Q_D(n) \in \O(Q_s(n) \cdot \log n)$.}
\proof
......@@ -231,8 +260,6 @@ constant time, $Q_D(n) = \sum_{i: B_i \neq \emptyset} Q_S(2^i) + \O(1)$. Since $
\leq Q_S(n)$ for all $x \leq n$, the inequality holds.
\qed
Now let us calculate the space complexity of $D$.
\lemma{$S_D(n) \in \O(S_S(n))$.}
\proof
......@@ -246,7 +273,7 @@ $$
\leq {S_S(n) \over n} \cdot \sum_{i=0}^{\log n} 2^i
\leq {S_S(n) \over n} \cdot n.
$$
\qed
\qedmath
It might be advantageous to store the elements in each block separately so that
we do not have to inspect the static structure and extract the elements from
......@@ -258,7 +285,9 @@ with elements $B_0 \cup B_1 \cup \dots \cup B_{i-1} \cup \{x\}$. This new block
has $1 + \sum_{j=0}^{i-1} 2^j = 2^i$ elements, which is the required size for
$B_i$. At last, we remove all blocks $B_0, \dots, B_{i-1}$ and add $B_i$.
TODO image
\figure{semidynamic-insert.pdf}{}{Insertion of $x$ in the structure for $n =
23$, blocks $\{x\}$, $B_0$ to $B_2$ merge to a new block $B_3$, block $B_4$ is
unchanged.}
\lemma{$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.}
......@@ -268,7 +297,7 @@ TODO image
As this function is non-decreasing, we can lower bound it by $B_S(n) /
n$. However, one element can participate in $\log n$ rebuilds during
the structure life. Therefore, each element needs to store up cost $\log n
\cdot B_S(n) / n$ to pay off all rebuilds.
\cdot B_S(n) / n$ to pay off all rebuilds. \qed
}
\theorem{
......@@ -278,10 +307,14 @@ Then there exists a semidynamic data structure $D$ answering $f$ with parameters
\tightlist{o}
\: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$,
\: $S_D(n) \in \O(S_S(n))$,
\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.
\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$ amortized.
\endlist
}
In general, the bound for insertion is not tight. If $B_S(n) =
\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated
and $\bar I_D(n) \in \O(n^\varepsilon)$.
\example
If we use a sorted array using binary search to search elements in a static
......@@ -295,10 +328,6 @@ We can speed up insertion time. Instead of building the list anew, we can merge
the lists in $\Theta(n)$ time, therefore speeding up insertion to $\O(\log n)$
amortized.
In general, the bound for insertion is not tight. If $B_S(n) =
\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated
and $\bar I_D(n) \in \O(n^\varepsilon)$.
\subsection{Worst-case semidynamization}
So far we have created a data structure that acts well in the long run, but one
......
import ads;
int[] block_indices = {0,0,1,2,4};
real[] block_offs;
real[] block_widths;
real w = 0.4;
real h = 0.4;
real s = 0.1;
real draw_block(real offset, int ypos, int index) {
real width = 2^index * w;
draw(box((offset, ypos), (offset+width, ypos+h)), thin);
return width;
}
string b_i(int i) {
return "\eightrm $B_" + string(i) + "$";
}
int prev_i = 0;
real offset = -s;
for (int i : block_indices) {
offset += s;
if (i == 4) {
offset += 3*s;
}
real width = draw_block(offset, 0, i);
block_offs.push(offset);
block_widths.push(width);
offset += width;
prev_i = i;
}
for (int i = 0; i < 5; ++i) {
real x = block_offs[i] + block_widths[i]/2;
string name;
if (i > 0)
name = b_i(block_indices[i]);
else
name = "$x$";
label(name, (x, h/2));
draw((x, -0.1) -- (x, -1+h+0.1), thin, e_arrow);
}
real width = draw_block(0, -1, 3);
label(b_i(3), (width/2, h/2 - 1));
real width2 = draw_block(block_offs[4], -1, 4);
label(b_i(4), (block_offs[4] + block_widths[4]/2, h/2 - 1));
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment