Dynamization: Addition of a picture and rewrite

e62e5242 · Václav Končický · 3a9cc9de · e62e5242 · e62e5242 · e62e5242
Commit e62e5242 authored 3 years ago by Václav Končický
--- a/vk-dynamic/Makefile
+++ b/vk-dynamic/Makefile
 TOP=..
+PICS=semidynamic-insert

 include ../Makerules
--- a/vk-dynamic/dynamic.tex
+++ b/vk-dynamic/dynamic.tex
@@ -7,46 +7,78 @@

 A data structure can be, depending on what operations are supported:

-\list{o}
+\tightlist{o}
 \: {\I static} if all operations after building the structure do not alter the
 data,
 \: {\I semidynamic} if data insertion is possible as an operation,
 \: {\I fully dynamic} if deletion of inserted data is allowed along with insertion.
 \endlist

+Static data structures are useful if we know the structure beforehand. In many
+cases, static data structures are simpler and faster than their dynamic
+alternatives.
+
+A sorted array is a typical example of a static data structure to store an
+ordered set of $n$ elements. Its supported operations are $\alg{Index}(i)$
+which simply returns $i$-th smallest element in constant time, and
+$\alg{Find}(x)$ which finds $x$ and its index $i$ in the array using binary
+search in time $\O(\log n)$.
+
+However, if we wish to insert a new element to already existing sorted array,
+this operation will take $\Omega(n)$ -- we must shift the elements to keep
+the sorted order. In order to have a fast insertion, we may decide to use a
+different dynamic data structure, such as a binary search tree. But then the
+operation \alg{Index} slows down to logarithmic time.
+
 In this chapter we will look at techniques of {\I dynamization} --
 transformation of a static data structure into a (semi)dynamic data structure.
+As we have seen with a sorted array, the simple and straight-forward attempts
+often lead to slow operations. Therefore, we want to dynamize data structures
+in such way that the operations stay reasonably fast.

 \section{Structure rebuilding}

 Consider a data structure with $n$ elements such that modifying it may cause
-severe problems that are too hard to fix easily. Therefore, we give up on
-fixing it and rebuild it completely anew. If we do this after $\Theta(n)$
-operations, we can amortize the cost of rebuild into those operations.  Let us
-look at such cases.
+severe problems that are too hard to fix easily. In such case, we give up on
+fixing it and rebuild it completely anew.
+
+If building such structure takes time $\O(f(n))$ and we perform the rebuild
+after $\Theta(n)$ modifying operations, we can amortize the cost of rebuild
+into the operations. This adds an amortized factor $\O(f(n)/n)$ to
+their time complexity, given that $n$ does not change asymptotically between
+the rebuilds.

+\examples
+
+\list{o}
+\:
 An array is a structure with limited capacity $c$. While it is dynamic (we can
-insert or remove elements from the end), we cannot insert new elements
+insert or remove elements at the end), we cannot insert new elements
 indefinitely. Once we run out of space, we build a new structure with capacity
 $2c$ and elements from the old structure.
-
 Since we insert at least $\Theta(n)$ elements to reach the limit from a freshly
 rebuilt structure, this amortizes to $\O(1)$ amortized time per an insertion,
 as we can rebuild an array in time $\O(n)$.

-Another example of such structure is an $y$-fast tree. It is parametrized by
+\:
+Another example of such structure is an $y$-fast trie. It is parametrized by
 block size required to be $\Theta(\log n)$ for good time complexity. If we let
-$n$ change enough such that $\log n$ changes asymptotically, everything breaks.
-We can save this by rebuilding the tree before this happens $n$ changes enough,
-which happens after $\Omega(n)$ operations.
+$n$ change enough such that $\log n$ changes asymptotically, the proven time
+complexity no longer holds.
+We can save this by rebuilding the trie once $n$
+changes by a constant factor (then $\log n$ changes by a constant additively).
+This happens no sooner than after $\Theta(n)$ insertions or deletions.

+\:
 Consider a data structure where instead of proper deletion of elements we just
 replace them with ``tombstones''. When we run a query, we ignore them. After
 enough deletions, most of the structure becomes filled with tombstones, leaving
-too little space for proper elements and slowing down the queries. Once again,
+too little space for proper elements and slowing down the queries.
+Once again,
 the fix is simple -- once at least $n/2$ of elements are tombstones, we rebuild
 the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$
-elements. If a rebuild takes $\Theta(n)$ time, this again amortizes.
+elements.
+\endlist

 \subsection{Local rebuilding}

@@ -201,7 +233,7 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:}
 \tightlist{o}
 \: $B_S(n)$ is time complexity of building $S$,
 \: $Q_S(n)$ is time complexity of query on $S$,
-\: $S_S(n)$ is the space complexity of $S$.
+\: $S_S(n)$ is the space complexity of $S$,
 \medskip
 \: $Q_D(n)$ is time complexity of query on $D$,
 \: $S_D(n)$ is the space complexity of $D$,
@@ -210,19 +242,16 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:}

 We assume that $Q_S(n)$, $B_S(n)/n$, $S_S(n)/n$ are all non-decreasing functions.

-We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$
-such that $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq
-j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$, its binary representation
-uniquely determines the block structure. Thus, the total number of blocks is at
-most $\log n$.
+We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$, $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq
+j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its
+binary representation uniquely determines the block structure. Thus, the total
+number of blocks is at most $\log n$.

 For each nonempty block $B_i$ we build a static structure $S$ of size $2^i$.
 Since $f$ is decomposable, a query on the structure will run queries on each
 block, and then combine them using $\sqcup$:
 $$ f(q, x) = f(q, B_0) \sqcup f(q, B_1) \sqcup \dots \sqcup f(q, B_i).$$

-TODO image
-
 \lemma{$Q_D(n) \in \O(Q_s(n) \cdot \log n)$.}

 \proof
@@ -231,8 +260,6 @@ constant time, $Q_D(n) = \sum_{i: B_i \neq \emptyset} Q_S(2^i) + \O(1)$. Since $
 \leq Q_S(n)$ for all $x \leq n$, the inequality holds.
 \qed

-Now let us calculate the space complexity of $D$.
-
 \lemma{$S_D(n) \in \O(S_S(n))$.}

 \proof
@@ -246,7 +273,7 @@ $$
 	\leq {S_S(n) \over n} \cdot \sum_{i=0}^{\log n} 2^i
 	\leq {S_S(n) \over n} \cdot n.
 $$
-\qed
+\qedmath

 It might be advantageous to store the elements in each block separately so that
 we do not have to inspect the static structure and extract the elements from
@@ -258,7 +285,9 @@ with elements $B_0 \cup B_1 \cup \dots \cup B_{i-1} \cup \{x\}$. This new block
 has $1 + \sum_{j=0}^{i-1} 2^j = 2^i$ elements, which is the required size for
 $B_i$. At last, we remove all blocks $B_0, \dots, B_{i-1}$ and add $B_i$.

-TODO image
+\figure{semidynamic-insert.pdf}{}{Insertion of $x$ in the structure for $n =
+23$, blocks $\{x\}$, $B_0$ to $B_2$ merge to a new block $B_3$, block $B_4$ is
+unchanged.}

 \lemma{$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.}

@@ -268,7 +297,7 @@ TODO image
 	As this function is non-decreasing, we can lower bound it by $B_S(n) /
 	n$. However, one element can participate in $\log n$ rebuilds during
 	the structure life. Therefore, each element needs to store up cost $\log n
-	\cdot B_S(n) / n$ to pay off all rebuilds.
+	\cdot B_S(n) / n$ to pay off all rebuilds. \qed
 }

 \theorem{
@@ -278,10 +307,14 @@ Then there exists a semidynamic data structure $D$ answering $f$ with parameters
 \tightlist{o}
 \: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$,
 \: $S_D(n) \in \O(S_S(n))$,
-\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.
+\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$ amortized.
 \endlist
 }

+In general, the bound for insertion is not tight. If $B_S(n) =
+\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated
+and $\bar I_D(n) \in \O(n^\varepsilon)$.
+
 \example

 If we use a sorted array using binary search to search elements in a static
@@ -295,10 +328,6 @@ We can speed up insertion time. Instead of building the list anew, we can merge
 the lists in $\Theta(n)$ time, therefore speeding up insertion to $\O(\log n)$
 amortized.

-In general, the bound for insertion is not tight. If $B_S(n) =
-\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated
-and $\bar I_D(n) \in \O(n^\varepsilon)$.
-
 \subsection{Worst-case semidynamization}

 So far we have created a data structure that acts well in the long run, but one

--- a/vk-dynamic/semidynamic-insert.asy
+++ b/vk-dynamic/semidynamic-insert.asy
+import ads;
+
+int[] block_indices = {0,0,1,2,4};
+real[] block_offs;
+real[] block_widths;
+
+real w = 0.4;
+real h = 0.4;
+real s = 0.1;
+
+real draw_block(real offset, int ypos, int index) {
+	real width = 2^index * w;
+	draw(box((offset, ypos), (offset+width, ypos+h)), thin);
+	return width;
+}
+
+string b_i(int i) {
+	return "\eightrm $B_" + string(i) + "$";
+}
+
+int prev_i = 0;
+real offset = -s;
+for (int i : block_indices) {
+	offset += s;
+	if (i == 4) {
+		offset += 3*s;
+	}
+	real width = draw_block(offset, 0, i);
+	block_offs.push(offset);
+	block_widths.push(width);
+	offset += width;
+	prev_i = i;
+}
+
+for (int i = 0; i < 5; ++i) {
+	real x = block_offs[i] + block_widths[i]/2;
+	string name;
+	if (i > 0)
+		name = b_i(block_indices[i]);
+	else
+		name = "$x$";
+	label(name, (x, h/2));
+	draw((x, -0.1) -- (x, -1+h+0.1), thin, e_arrow);
+}
+real width = draw_block(0, -1, 3);
+label(b_i(3), (width/2, h/2 - 1));
+real width2 = draw_block(block_offs[4], -1, 4);
+label(b_i(4), (block_offs[4] + block_widths[4]/2, h/2 - 1));