From e62e52426f150f622adc86fc44d96f5911bb2c9a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?V=C3=A1clav=20Kon=C4=8Dick=C3=BD?= <koncicky@kam.mff.cuni.cz> Date: Sat, 28 Aug 2021 18:52:35 +0200 Subject: [PATCH] Dynamization: Addition of a picture and rewrite --- vk-dynamic/Makefile | 1 + vk-dynamic/dynamic.tex | 91 ++++++++++++++++++++----------- vk-dynamic/semidynamic-insert.asy | 48 ++++++++++++++++ 3 files changed, 109 insertions(+), 31 deletions(-) create mode 100644 vk-dynamic/semidynamic-insert.asy diff --git a/vk-dynamic/Makefile b/vk-dynamic/Makefile index ba6c63e..3ef9919 100644 --- a/vk-dynamic/Makefile +++ b/vk-dynamic/Makefile @@ -1,3 +1,4 @@ TOP=.. +PICS=semidynamic-insert include ../Makerules diff --git a/vk-dynamic/dynamic.tex b/vk-dynamic/dynamic.tex index a34287e..2fed7da 100644 --- a/vk-dynamic/dynamic.tex +++ b/vk-dynamic/dynamic.tex @@ -7,46 +7,78 @@ A data structure can be, depending on what operations are supported: -\list{o} +\tightlist{o} \: {\I static} if all operations after building the structure do not alter the data, \: {\I semidynamic} if data insertion is possible as an operation, \: {\I fully dynamic} if deletion of inserted data is allowed along with insertion. \endlist +Static data structures are useful if we know the structure beforehand. In many +cases, static data structures are simpler and faster than their dynamic +alternatives. + +A sorted array is a typical example of a static data structure to store an +ordered set of $n$ elements. Its supported operations are $\alg{Index}(i)$ +which simply returns $i$-th smallest element in constant time, and +$\alg{Find}(x)$ which finds $x$ and its index $i$ in the array using binary +search in time $\O(\log n)$. + +However, if we wish to insert a new element to already existing sorted array, +this operation will take $\Omega(n)$ -- we must shift the elements to keep +the sorted order. In order to have a fast insertion, we may decide to use a +different dynamic data structure, such as a binary search tree. But then the +operation \alg{Index} slows down to logarithmic time. + In this chapter we will look at techniques of {\I dynamization} -- transformation of a static data structure into a (semi)dynamic data structure. +As we have seen with a sorted array, the simple and straight-forward attempts +often lead to slow operations. Therefore, we want to dynamize data structures +in such way that the operations stay reasonably fast. \section{Structure rebuilding} Consider a data structure with $n$ elements such that modifying it may cause -severe problems that are too hard to fix easily. Therefore, we give up on -fixing it and rebuild it completely anew. If we do this after $\Theta(n)$ -operations, we can amortize the cost of rebuild into those operations. Let us -look at such cases. +severe problems that are too hard to fix easily. In such case, we give up on +fixing it and rebuild it completely anew. + +If building such structure takes time $\O(f(n))$ and we perform the rebuild +after $\Theta(n)$ modifying operations, we can amortize the cost of rebuild +into the operations. This adds an amortized factor $\O(f(n)/n)$ to +their time complexity, given that $n$ does not change asymptotically between +the rebuilds. +\examples + +\list{o} +\: An array is a structure with limited capacity $c$. While it is dynamic (we can -insert or remove elements from the end), we cannot insert new elements +insert or remove elements at the end), we cannot insert new elements indefinitely. Once we run out of space, we build a new structure with capacity $2c$ and elements from the old structure. - Since we insert at least $\Theta(n)$ elements to reach the limit from a freshly rebuilt structure, this amortizes to $\O(1)$ amortized time per an insertion, as we can rebuild an array in time $\O(n)$. -Another example of such structure is an $y$-fast tree. It is parametrized by +\: +Another example of such structure is an $y$-fast trie. It is parametrized by block size required to be $\Theta(\log n)$ for good time complexity. If we let -$n$ change enough such that $\log n$ changes asymptotically, everything breaks. -We can save this by rebuilding the tree before this happens $n$ changes enough, -which happens after $\Omega(n)$ operations. +$n$ change enough such that $\log n$ changes asymptotically, the proven time +complexity no longer holds. +We can save this by rebuilding the trie once $n$ +changes by a constant factor (then $\log n$ changes by a constant additively). +This happens no sooner than after $\Theta(n)$ insertions or deletions. +\: Consider a data structure where instead of proper deletion of elements we just replace them with ``tombstones''. When we run a query, we ignore them. After enough deletions, most of the structure becomes filled with tombstones, leaving -too little space for proper elements and slowing down the queries. Once again, +too little space for proper elements and slowing down the queries. +Once again, the fix is simple -- once at least $n/2$ of elements are tombstones, we rebuild the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$ -elements. If a rebuild takes $\Theta(n)$ time, this again amortizes. +elements. +\endlist \subsection{Local rebuilding} @@ -201,7 +233,7 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:} \tightlist{o} \: $B_S(n)$ is time complexity of building $S$, \: $Q_S(n)$ is time complexity of query on $S$, -\: $S_S(n)$ is the space complexity of $S$. +\: $S_S(n)$ is the space complexity of $S$, \medskip \: $Q_D(n)$ is time complexity of query on $D$, \: $S_D(n)$ is the space complexity of $D$, @@ -210,19 +242,16 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:} We assume that $Q_S(n)$, $B_S(n)/n$, $S_S(n)/n$ are all non-decreasing functions. -We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$ -such that $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq -j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$, its binary representation -uniquely determines the block structure. Thus, the total number of blocks is at -most $\log n$. +We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$, $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq +j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its +binary representation uniquely determines the block structure. Thus, the total +number of blocks is at most $\log n$. For each nonempty block $B_i$ we build a static structure $S$ of size $2^i$. Since $f$ is decomposable, a query on the structure will run queries on each block, and then combine them using $\sqcup$: $$ f(q, x) = f(q, B_0) \sqcup f(q, B_1) \sqcup \dots \sqcup f(q, B_i).$$ -TODO image - \lemma{$Q_D(n) \in \O(Q_s(n) \cdot \log n)$.} \proof @@ -231,8 +260,6 @@ constant time, $Q_D(n) = \sum_{i: B_i \neq \emptyset} Q_S(2^i) + \O(1)$. Since $ \leq Q_S(n)$ for all $x \leq n$, the inequality holds. \qed -Now let us calculate the space complexity of $D$. - \lemma{$S_D(n) \in \O(S_S(n))$.} \proof @@ -246,7 +273,7 @@ $$ \leq {S_S(n) \over n} \cdot \sum_{i=0}^{\log n} 2^i \leq {S_S(n) \over n} \cdot n. $$ -\qed +\qedmath It might be advantageous to store the elements in each block separately so that we do not have to inspect the static structure and extract the elements from @@ -258,7 +285,9 @@ with elements $B_0 \cup B_1 \cup \dots \cup B_{i-1} \cup \{x\}$. This new block has $1 + \sum_{j=0}^{i-1} 2^j = 2^i$ elements, which is the required size for $B_i$. At last, we remove all blocks $B_0, \dots, B_{i-1}$ and add $B_i$. -TODO image +\figure{semidynamic-insert.pdf}{}{Insertion of $x$ in the structure for $n = +23$, blocks $\{x\}$, $B_0$ to $B_2$ merge to a new block $B_3$, block $B_4$ is +unchanged.} \lemma{$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.} @@ -268,7 +297,7 @@ TODO image As this function is non-decreasing, we can lower bound it by $B_S(n) / n$. However, one element can participate in $\log n$ rebuilds during the structure life. Therefore, each element needs to store up cost $\log n - \cdot B_S(n) / n$ to pay off all rebuilds. + \cdot B_S(n) / n$ to pay off all rebuilds. \qed } \theorem{ @@ -278,10 +307,14 @@ Then there exists a semidynamic data structure $D$ answering $f$ with parameters \tightlist{o} \: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$, \: $S_D(n) \in \O(S_S(n))$, -\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$. +\: $\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$ amortized. \endlist } +In general, the bound for insertion is not tight. If $B_S(n) = +\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated +and $\bar I_D(n) \in \O(n^\varepsilon)$. + \example If we use a sorted array using binary search to search elements in a static @@ -295,10 +328,6 @@ We can speed up insertion time. Instead of building the list anew, we can merge the lists in $\Theta(n)$ time, therefore speeding up insertion to $\O(\log n)$ amortized. -In general, the bound for insertion is not tight. If $B_S(n) = -\O(n^\varepsilon)$ for $\varepsilon > 1$, the logarithmic factor is dominated -and $\bar I_D(n) \in \O(n^\varepsilon)$. - \subsection{Worst-case semidynamization} So far we have created a data structure that acts well in the long run, but one diff --git a/vk-dynamic/semidynamic-insert.asy b/vk-dynamic/semidynamic-insert.asy new file mode 100644 index 0000000..cd91cb1 --- /dev/null +++ b/vk-dynamic/semidynamic-insert.asy @@ -0,0 +1,48 @@ +import ads; + +int[] block_indices = {0,0,1,2,4}; +real[] block_offs; +real[] block_widths; + +real w = 0.4; +real h = 0.4; +real s = 0.1; + +real draw_block(real offset, int ypos, int index) { + real width = 2^index * w; + draw(box((offset, ypos), (offset+width, ypos+h)), thin); + return width; +} + +string b_i(int i) { + return "\eightrm $B_" + string(i) + "$"; +} + +int prev_i = 0; +real offset = -s; +for (int i : block_indices) { + offset += s; + if (i == 4) { + offset += 3*s; + } + real width = draw_block(offset, 0, i); + block_offs.push(offset); + block_widths.push(width); + offset += width; + prev_i = i; +} + +for (int i = 0; i < 5; ++i) { + real x = block_offs[i] + block_widths[i]/2; + string name; + if (i > 0) + name = b_i(block_indices[i]); + else + name = "$x$"; + label(name, (x, h/2)); + draw((x, -0.1) -- (x, -1+h+0.1), thin, e_arrow); +} +real width = draw_block(0, -1, 3); +label(b_i(3), (width/2, h/2 - 1)); +real width2 = draw_block(block_offs[4], -1, 4); +label(b_i(4), (block_offs[4] + block_widths[4]/2, h/2 - 1)); -- GitLab