Commit e62e5242 by Václav Končický

### Dynamization: Addition of a picture and rewrite

parent 3a9cc9de
 TOP=.. PICS=semidynamic-insert include ../Makerules
 ... ... @@ -7,46 +7,78 @@ A data structure can be, depending on what operations are supported: \list{o} \tightlist{o} \: {\I static} if all operations after building the structure do not alter the data, \: {\I semidynamic} if data insertion is possible as an operation, \: {\I fully dynamic} if deletion of inserted data is allowed along with insertion. \endlist Static data structures are useful if we know the structure beforehand. In many cases, static data structures are simpler and faster than their dynamic alternatives. A sorted array is a typical example of a static data structure to store an ordered set of $n$ elements. Its supported operations are $\alg{Index}(i)$ which simply returns $i$-th smallest element in constant time, and $\alg{Find}(x)$ which finds $x$ and its index $i$ in the array using binary search in time $\O(\log n)$. However, if we wish to insert a new element to already existing sorted array, this operation will take $\Omega(n)$ -- we must shift the elements to keep the sorted order. In order to have a fast insertion, we may decide to use a different dynamic data structure, such as a binary search tree. But then the operation \alg{Index} slows down to logarithmic time. In this chapter we will look at techniques of {\I dynamization} -- transformation of a static data structure into a (semi)dynamic data structure. As we have seen with a sorted array, the simple and straight-forward attempts often lead to slow operations. Therefore, we want to dynamize data structures in such way that the operations stay reasonably fast. \section{Structure rebuilding} Consider a data structure with $n$ elements such that modifying it may cause severe problems that are too hard to fix easily. Therefore, we give up on fixing it and rebuild it completely anew. If we do this after $\Theta(n)$ operations, we can amortize the cost of rebuild into those operations. Let us look at such cases. severe problems that are too hard to fix easily. In such case, we give up on fixing it and rebuild it completely anew. If building such structure takes time $\O(f(n))$ and we perform the rebuild after $\Theta(n)$ modifying operations, we can amortize the cost of rebuild into the operations. This adds an amortized factor $\O(f(n)/n)$ to their time complexity, given that $n$ does not change asymptotically between the rebuilds. \examples \list{o} \: An array is a structure with limited capacity $c$. While it is dynamic (we can insert or remove elements from the end), we cannot insert new elements insert or remove elements at the end), we cannot insert new elements indefinitely. Once we run out of space, we build a new structure with capacity $2c$ and elements from the old structure. Since we insert at least $\Theta(n)$ elements to reach the limit from a freshly rebuilt structure, this amortizes to $\O(1)$ amortized time per an insertion, as we can rebuild an array in time $\O(n)$. Another example of such structure is an $y$-fast tree. It is parametrized by \: Another example of such structure is an $y$-fast trie. It is parametrized by block size required to be $\Theta(\log n)$ for good time complexity. If we let $n$ change enough such that $\log n$ changes asymptotically, everything breaks. We can save this by rebuilding the tree before this happens $n$ changes enough, which happens after $\Omega(n)$ operations. $n$ change enough such that $\log n$ changes asymptotically, the proven time complexity no longer holds. We can save this by rebuilding the trie once $n$ changes by a constant factor (then $\log n$ changes by a constant additively). This happens no sooner than after $\Theta(n)$ insertions or deletions. \: Consider a data structure where instead of proper deletion of elements we just replace them with tombstones''. When we run a query, we ignore them. After enough deletions, most of the structure becomes filled with tombstones, leaving too little space for proper elements and slowing down the queries. Once again, too little space for proper elements and slowing down the queries. Once again, the fix is simple -- once at least $n/2$ of elements are tombstones, we rebuild the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$ elements. If a rebuild takes $\Theta(n)$ time, this again amortizes. elements. \endlist \subsection{Local rebuilding} ... ... @@ -201,7 +233,7 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:} \tightlist{o} \: $B_S(n)$ is time complexity of building $S$, \: $Q_S(n)$ is time complexity of query on $S$, \: $S_S(n)$ is the space complexity of $S$. \: $S_S(n)$ is the space complexity of $S$, \medskip \: $Q_D(n)$ is time complexity of query on $D$, \: $S_D(n)$ is the space complexity of $D$, ... ... @@ -210,19 +242,16 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:} We assume that $Q_S(n)$, $B_S(n)/n$, $S_S(n)/n$ are all non-decreasing functions. We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$ such that $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$, its binary representation uniquely determines the block structure. Thus, the total number of blocks is at most $\log n$. We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$, $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its binary representation uniquely determines the block structure. Thus, the total number of blocks is at most $\log n$. For each nonempty block $B_i$ we build a static structure $S$ of size $2^i$. Since $f$ is decomposable, a query on the structure will run queries on each block, and then combine them using $\sqcup$: $$f(q, x) = f(q, B_0) \sqcup f(q, B_1) \sqcup \dots \sqcup f(q, B_i).$$ TODO image \lemma{$Q_D(n) \in \O(Q_s(n) \cdot \log n)$.} \proof ... ... @@ -231,8 +260,6 @@ constant time, $Q_D(n) = \sum_{i: B_i \neq \emptyset} Q_S(2^i) + \O(1)$. Since $\leq Q_S(n)$ for all $x \leq n$, the inequality holds. \qed Now let us calculate the space complexity of $D$. \lemma{$S_D(n) \in \O(S_S(n))$.} \proof ... ... @@ -246,7 +273,7 @@ $$\leq {S_S(n) \over n} \cdot \sum_{i=0}^{\log n} 2^i \leq {S_S(n) \over n} \cdot n.$$ \qed \qedmath It might be advantageous to store the elements in each block separately so that we do not have to inspect the static structure and extract the elements from ... ... @@ -258,7 +285,9 @@ with elements $B_0 \cup B_1 \cup \dots \cup B_{i-1} \cup \{x\}$. This new block has $1 + \sum_{j=0}^{i-1} 2^j = 2^i$ elements, which is the required size for $B_i$. At last, we remove all blocks $B_0, \dots, B_{i-1}$ and add $B_i$. TODO image \figure{semidynamic-insert.pdf}{}{Insertion of $x$ in the structure for $n = 23$, blocks $\{x\}$, $B_0$ to $B_2$ merge to a new block $B_3$, block $B_4$ is unchanged.} \lemma{$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$.} ... ... @@ -268,7 +297,7 @@ TODO image As this function is non-decreasing, we can lower bound it by $B_S(n) / n$. However, one element can participate in $\log n$ rebuilds during the structure life. Therefore, each element needs to store up cost $\log n \cdot B_S(n) / n$ to pay off all rebuilds. \cdot B_S(n) / n$to pay off all rebuilds. \qed } \theorem{ ... ... @@ -278,10 +307,14 @@ Then there exists a semidynamic data structure$D$answering$f$with parameters \tightlist{o} \:$Q_D(n) \in \O(Q_S(n) \cdot \log_n)$, \:$S_D(n) \in \O(S_S(n))$, \:$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$. \:$\bar I_D(n) \in \O(B_S(n)/n \cdot \log n)$amortized. \endlist } In general, the bound for insertion is not tight. If$B_S(n) = \O(n^\varepsilon)$for$\varepsilon > 1$, the logarithmic factor is dominated and$\bar I_D(n) \in \O(n^\varepsilon)$. \example If we use a sorted array using binary search to search elements in a static ... ... @@ -295,10 +328,6 @@ We can speed up insertion time. Instead of building the list anew, we can merge the lists in$\Theta(n)$time, therefore speeding up insertion to$\O(\log n)$amortized. In general, the bound for insertion is not tight. If$B_S(n) = \O(n^\varepsilon)$for$\varepsilon > 1$, the logarithmic factor is dominated and$\bar I_D(n) \in \O(n^\varepsilon)$. \subsection{Worst-case semidynamization} So far we have created a data structure that acts well in the long run, but one ... ...  import ads; int[] block_indices = {0,0,1,2,4}; real[] block_offs; real[] block_widths; real w = 0.4; real h = 0.4; real s = 0.1; real draw_block(real offset, int ypos, int index) { real width = 2^index * w; draw(box((offset, ypos), (offset+width, ypos+h)), thin); return width; } string b_i(int i) { return "\eightrm$B_" + string(i) + "$"; } int prev_i = 0; real offset = -s; for (int i : block_indices) { offset += s; if (i == 4) { offset += 3*s; } real width = draw_block(offset, 0, i); block_offs.push(offset); block_widths.push(width); offset += width; prev_i = i; } for (int i = 0; i < 5; ++i) { real x = block_offs[i] + block_widths[i]/2; string name; if (i > 0) name = b_i(block_indices[i]); else name = "$x\$"; label(name, (x, h/2)); draw((x, -0.1) -- (x, -1+h+0.1), thin, e_arrow); } real width = draw_block(0, -1, 3); label(b_i(3), (width/2, h/2 - 1)); real width2 = draw_block(block_offs[4], -1, 4); label(b_i(4), (block_offs[4] + block_widths[4]/2, h/2 - 1));
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment