Proof-reading of dynamize chapter

04bd1ec6 · Jirka Skrobanek · c0f51118 · 04bd1ec6
Commit 04bd1ec6 authored 3 years ago by Jirka Skrobanek
--- a/51-dynamize/dynamize.tex
+++ b/51-dynamize/dynamize.tex
@@ -9,44 +9,45 @@ A data structure can be, depending on what operations are supported:

 \tightlist{o}
 \: {\I static} if all operations after building the structure do not alter the
-data,
-\: {\I semidynamic} if data insertion is possible as an operation,
-\: {\I fully dynamic} if deletion of inserted data is allowed along with insertion.
+data \foot{As a side note regarding this terminology, let us remark about the distinction 
+between an update of the proper data stored inside a data structure and an update of some
+auxiliary data. For example a splay tree can change shape even though only queries and no
+updates happen.},
+\: {\I semi-dynamic} if data insertion is possible as an operation,
+\: {\I (fully) dynamic} if deletion of data is allowed on top of insertion.
 \endlist

-Static data structures are useful if we know the structure beforehand. In many
-cases, static data structures are simpler and faster than their dynamic
-alternatives.
-
+Static data structures are often sufficient for many applications where updates are simply not required. 
 A sorted array is a typical example of a static data structure to store an
 ordered set of $n$ elements. Its supported operations are $\alg{Index}(i)$
 which simply returns $i$-th smallest element in constant time, and
 $\alg{Find}(x)$ which finds $x$ and its index $i$ in the array using binary
 search in time $\O(\log n)$.

-However, if we wish to insert a new element to already existing sorted array,
+However, if we wish to insert a new element to an already existing sorted array,
 this operation will take $\Omega(n)$ -- we must shift the elements to keep
 the sorted order. In order to have a fast insertion, we may decide to use a
-different dynamic data structure, such as a binary search tree. But then the
-operation \alg{Index} slows down to logarithmic time.
-
-In this chapter we will look at techniques of {\I dynamization} --
+different dynamic data structure, a binary search tree (BST) for instance. In which case
+the operation \alg{Index} slows down to logarithmic time.
+
+What happened to \alg{Index} is a frequent inconvenience when we modify data structures to support updates. 
+Oftentimes making one operation run faster is only possible by making another operation 
+run slower. One must therefore strike a careful balance among complexities of 
+individual operations based on how often they are needed. It is the subject 
+of this chapter to show some efficient techniques of {\I dynamization} --
 transformation of a static data structure into a (semi)dynamic data structure.
-As we have seen with a sorted array, the simple and straight-forward attempts
-often lead to slow operations. Therefore, we want to dynamize data structures
-in such way that the operations stay reasonably fast.

-\section{Structure rebuilding}
+\section{Global rebuilding}

 Consider a data structure with $n$ elements such that modifying it may cause
-severe problems that are too hard to fix easily. In such case, we give up on
+severe problems that are too hard to fix easily. In that case, we give up on
 fixing it and rebuild it completely anew.

-If building such structure takes time $\O(f(n))$ and we perform the rebuild
-after $\Theta(n)$ modifying operations, we can amortize the cost of rebuild
-into the operations. This adds an amortized factor $\O(f(n)/n)$ to
-their time complexity, given that $n$ does not change asymptotically between
-the rebuilds.
+If initializing the structure with $n$ elements takes $f(n)$ time steps and 
+we perform the rebuild after $\Theta(n)$ modifying operations, we can amortize 
+the cost of rebuild into the operations. This adds an amortized factor 
+$\O(f(n)/n)$ to their time complexity, given that $n$ does not change 
+asymptotically between the rebuilds.

 \examples

@@ -55,13 +56,13 @@ the rebuilds.
 An array is a structure with limited capacity $c$. While it is dynamic (we can
 insert or remove elements at the end), we cannot insert new elements
 indefinitely. Once we run out of space, we build a new structure with capacity
-$2c$ and elements from the old structure.
-Since we insert at least $\Theta(n)$ elements to reach the limit from a freshly
-rebuilt structure, this amortizes to $\O(1)$ amortized time per an insertion,
+$2c$ and copy to it the elements from the old structure.
+Since we inserted at least $\Theta(n)$ elements to reach the limit from a freshly
+rebuilt structure, this amortizes to $\O(1)$ time per insertion,
 as we can rebuild an array in time $\O(n)$.

 \:
-Another example of such structure is an $y$-fast trie. It is parametrized by
+Another example of such a structure is a $y$-fast trie. It is parametrized by
 block size required to be $\Theta(\log n)$ for good time complexity. If we let
 $n$ change enough such that $\log n$ changes asymptotically, the proven time
 complexity no longer holds.
@@ -70,12 +71,11 @@ changes by a constant factor (then $\log n$ changes by a constant additively).
 This happens no sooner than after $\Theta(n)$ insertions or deletions.

 \:
-Consider a data structure where instead of proper deletion of elements we just
-replace them with ``tombstones''. When we run a query, we ignore them. After
-enough deletions, most of the structure becomes filled with tombstones, leaving
+Consider a data structure where instead of proper removal of elements on deletion
+we just replace them with ``tombstones''. When we run a query later, we just ignore tombstones.
+After enough deletions, most of the structure becomes filled with tombstones, leaving
 too little space for proper elements and slowing down the queries.
-Once again,
-the fix is simple -- once at least $n/2$ of elements are tombstones, we rebuild
+Once again, the idea is simple -- once at least $n/2$ of elements are tombstones, we rebuild
 the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$
 elements.
 \endlist
@@ -83,94 +83,23 @@ elements.
 \subsection{Local rebuilding}

 In many cases, it is enough to rebuild just a part of the structure to fix
-local problems. Once again, if a structure part has size $k$, we want to have
-done at least $\Theta(k)$ operations since its last rebuild. This then allows
-the rebuild to amortize into other operations.
+local problems. If a segment of the structure has size $k$, we want to have
+space out reconstructions at least $\Theta(k)$ operations apart, allowing it 
+to amortize into other operations.

-One of such structures is a binary search tree. We start with a perfectly
-balanced tree. As we insert or remove nodes, the tree structure degrades over
+One of such structures is a BST. Imagine starting with a perfectly
+balanced tree, inserting and removing nodes, the tree structure degrades over
 time. With a particular choice of operations, we can force the tree to
 degenerate into a long vine, having linear depth.

-To fix this problem, we define a parameter $1/2 < \alpha < 1$ as a {\I balance
-limit}. We use it to determine if a tree is balanced enough.
-
-\defn{
-	A node $v$ is balanced, if for each its child $c$ we have $s(c) \leq
-	\alpha s(v)$. A tree $T$ is balanced, if all its nodes are balanced.
-}
-
-\lemma{
-	If a tree with $n$ nodes is balanced, then its height is
-	$\O(\log_{1/\alpha} n)$.
-}
-
-\proof
-Choose an arbitrary path from the root to a leaf and track the node
-sizes. The root has size $n$. Each subsequent node has its size at most
-$\alpha n$. Once we reach a leaf, its size is 1. Thus the path can
-contain at most $\log_{1/\alpha} n$ edges.
-\qed
-
-Therefore, we want to keep the nodes balanced between any operations. If any
-node becomes unbalanced, we take the highest such node $v$ and rebuild its
-subtree $T(v)$ into a perfectly balanced tree.
-
-For $\alpha$ close to $1/2$ any balanced tree closely resembles a perfectly
-balanced tree, while with $\alpha$ close to 1 the tree can degenerate much
-more. This parameter therefore controls how often we cause local rebuilds
-and the tree height. The trees defined by this parameter are called
-$BB[\alpha]$ trees.
-
-Rebuilding a subtree $T(v)$ takes $\O(s(v))$ time, but we can show that this
-happens infrequently enough. Both insertion and deletion change the amount of
-nodes by one. To unbalance a root of a perfectly balanced trees, and thus cause
-a rebuild, we need to add or remove at least $\Theta(n)$ vertices. We will
-show this more in detail for insertion.
-
-\theorem{
-	Amortized time complexity of the \alg{Insert} operation is $\O(\log
-	n)$, with constant factor dependent on $\alpha$.
-}
-
-\proof
-We define a potential as a sum of ``badness'' of all tree nodes. Each node will
-contribute by the difference of sizes of its left and right child. To make
-sure that perfectly balanced subtrees do not contribute, we clamp difference of
-1 to 0.
-$$\eqalign{
-	\Phi &:= \sum_v \varphi(v), \quad\hbox{where} \cr
-	\varphi(v) &:= \cases{
-		\left\vert s(\ell(v)) - s(r(v)) \right\vert & if at least~2, \cr
-		0 & otherwise. \cr
-	} \cr
-}$$
-When we add a new leaf, the size of all nodes on the path to the root increases
-by 1. The contribution to the potential is therefore at most 2.
-
-We spend $\O(\log n)$ time on the operation. If all nodes stay balanced and
-thus no rebuild takes place, potential increases by $\O(\log n)$, resulting in
-amortized time $\O(\log n)$.
-
-Otherwise, consider the highest unbalanced node $v$. Without loss of
-generality, the invariant was broken for its left child $l(v)$, thus
-$s(l(v)) > \alpha \cdot s(v)$. Therefore, the size of the other child is small:
-$s(r(v)) < (1 - \alpha) \cdot s(v)$. The contribution of $v$ is therefore
-$\varphi(v) > (2\alpha - 1) \cdot s(v)$.
-
-After rebuilding $T(v)$, the subtree becomes perfectly balanced. Therefore for
-all nodes $u \in T(v)$ the contribution $\varphi(u)$ becomes zero. All other
-contributions stay the same. Thus, the potential decreases by at least
-$(2\alpha - 1) \cdot s(v) \in \Theta(s(v))$. By multiplying the potential by a
-suitable constant, the real cost $\Theta(s(v))$ of rebuild will be fully
-compensated by the potential decrease, yielding zero amortized cost.
-\qed
+We introduced weight-balanced trees that maintain balance using an algorithm based on this idea 
+of partial reconstruction in the very first chapter of these lecture notes.

-\section{General semidynamization}
+\section{General semi-dynamization}

 Let us have a static data structure $S$. We do not need to know how the data
 structure is implemented internally. We would like to use $S$ as a ``black
-box'' to build a (semi)dynamic data structure $D$ which supports queries of $S$
+box'' to build a (semi-)dynamic data structure $D$ which supports queries of $S$
 but also allows element insertion.

 This is not always possible, the data structure needs to support a specific
@@ -178,16 +107,16 @@ type of queries answering {\I decomposable search problems}.

 \defn{
 A {\I search problem} is a mapping $f: U_Q \times 2^{U_X} \to U_R$ where $U_Q$
-is an universe of queries, $U_X$ is an universe of elements and $U_R$ is set of
+is a universe of queries, $U_X$ is a universe of elements and $U_R$ is set of
 possible answers.
 }

 \defn{
 A search problem is {\I decomposable}, if there exists an operator $\sqcup: U_R
 \times U_R$ computable in time $\O(1)$\foot{
-The constant time constraint is only needed for a good time complexity of $D$.
-If it is not met, the construction will still work correctly. Most practical composable
-problems meet this condition.}
+Most practical decomposable problems do meet this condition.
+If it is not met, the construction will still work correctly, 
+but the time complexity may increase.}
 such that $\forall A, B \subseteq U_X$, $A \cap B = \emptyset$ and $\forall q
 \in U_Q$: $$ f(q, A \cup B) = f(q, A) \sqcup f(q, B).$$
 }
@@ -202,7 +131,7 @@ operator $\sqcup$ is a simple binary \alg{or}.

 \: Let $X$ be set of points on a plane. For a point $q$, what is the distance
 of $q$ and the point $x \in X$ closest to $q$? This is a search problem where
-$U_Q = U_X = \R^2$ and $U_R = \R^+_0$. It is also decomposable -- $\sqcup$
+$U_Q = U_X = \R^2$ and $U_R$ is the set of non-negative reals. It is also decomposable -- $\sqcup$
 returns the minimum.

 \: Let $X$ be set of points of a plane. Is $q$ in convex hull of $X$? This
@@ -242,8 +171,8 @@ decomposable search problem $f$ and the resulting dynamic data structure $D$:}

 We assume that $Q_S(n)$, $B_S(n)/n$, $S_S(n)/n$ are all non-decreasing functions.

-We decompose the set $X$ into blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$, $\bigcup_i B_i = X$ and $B_i \cap B_j = \emptyset$ for all $i \neq
-j$. Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its
+We cover the set $X$ by pair-wise disjoint blocks $B_i$ such that $|B_i| \in \{0, 2^i\}$.
+Let $|X| = n$. Since $n = \sum_i n_i 2^i$ for $n_i \in \{0, 1\}$, its
 binary representation uniquely determines the block structure. Thus, the total
 number of blocks is at most $\log n$.

@@ -252,7 +181,7 @@ Since $f$ is decomposable, a query on the structure will run queries on each
 block, and then combine them using $\sqcup$:
 $$ f(q, x) = f(q, B_0) \sqcup f(q, B_1) \sqcup \dots \sqcup f(q, B_i).$$

-\lemma{$Q_D(n) \in \O(Q_s(n) \cdot \log n)$.}
+\lemma{$Q_D(n) \in \O(Q_S(n) \cdot \log n)$.}

 \proof
 Let $|X| = n$. Then the block structure is determined and $\sqcup$ takes
@@ -302,7 +231,7 @@ unchanged.}

 \theorem{
 Let $S$ be a static data structure answering a decomposable search problem $f$.
-Then there exists a semidynamic data structure $D$ answering $f$ with parameters
+Then there exists a semi-dynamic data structure $D$ answering $f$ with parameters

 \tightlist{o}
 \: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$,
@@ -328,17 +257,17 @@ We can speed up insertion time. Instead of building the list anew, we can merge
 the lists in $\Theta(n)$ time, therefore speeding up insertion to $\O(\log n)$
 amortized.

-\subsection{Worst-case semidynamization}
+\subsection{Worst-case semi-dynamization}

 So far we have created a data structure that acts well in the long run, but one
-insertion can take long time. This may be unsuitable for applications where we
-require a low latency. In such cases, we would like that each insertion is fast
+insertion can take a long time. This may be unsuitable for applications where we
+require low latency. In such cases, we would like that each insertion is fast
 even in the worst case.

 Our construction can be deamortized for the price that the resulting
-semidynamic data structure will be more complicated. We do this by not
+semi-dynamic data structure will be more complicated. We do this by not
 constructing the block at once, but decomposing the construction such that on
-each operation we do does a small amount of work on it until eventually the whole
+each operation we do a small amount of work on it until eventually the whole
 block is constructed.

 However, insertion is not the only operation, we can also ask queries even
@@ -387,7 +316,7 @@ $\log n$ blocks in construction.

 \theorem{
 Let $S$ be a static data structure answering a decomposable problem $f$. Then
-there exists semidynamic structure with parameters
+there exists semi-dynamic structure with parameters

 \tightlist{o}
 \: $Q_D(n) \in \O(Q_S(n) \cdot \log_n)$,
@@ -409,15 +338,15 @@ together, we get the required upper bound.
 \subsection{Full dynamization}

 For our definition of search problems, it is not easy to delete elements, as
-anytime we wished to delete an element we would need to take apart and split a
-structure into a few smaller ones. This could never be able to amortize to
+any time we wished to delete an element, we would need to take apart and split a
+structure into a few smaller ones. This would never to amortize to a
 decent deletion time.

-Instead of that, we will want the underlying static structure to have an
+Instead of that, we will want the underlying statics structure to have an
 ability to cross out elements. These elements will no longer participate in
 queries, but they will count towards the structure size and complexity.

-Once we have ability to cross out elements, we can upgrade the semidynamic data
+Once we have ability to cross out elements, we can upgrade the semi-dynamic data
 structure to support deletion. We add a binary search tree or another set
 structure which maps each element to a block it lives in. For each element we
 keep a pointer on its instance in the BST. When we build a new block, we can