Dynamization: rebuilding and BB[\alpha] trees

156967af · Václav Končický · 28b0254f · 156967af
Commit 156967af authored 4 years ago by Václav Končický
--- a/vk-dynamic/dynamic.tex
+++ b/vk-dynamic/dynamic.tex
@@ -5,4 +5,133 @@

 \chapter[dynamic]{Dynamization}

+A data structure can be, depending on what operations are supported:
+
+\list{o}
+\: {\I static} if all operations after building the structure do not alter the
+data,
+\: {\I semidynamic} if data insertion is possible as an operation,
+\: {\I fully dynamic} if deletion of inserted data is allowed along with insertion.
+\endlist
+
+In this chapter we will look at techniques of {\I dynamization} --
+transformation of a static data structure into a (semi)dynamic data structure.
+
+\section{Structure rebuilding}
+
+Consider a data structure with $n$ elements such that modifying it may cause
+severe problems that are too hard to fix easily. Therefore, we give up on
+fixing it and rebuild it completely anew. If we do this after $\Theta(n)$
+operations, we can amortize the cost of rebuild into those operations.  Let us
+look at such cases.
+
+An array is a structure with limited capacity $c$. While it is dynamic (we can
+insert or remove elements from the end), we cannot insert new elements
+indefinitely. Once we run out of space, we build a new structure with capacity
+$2c$ and elements from the old structure.
+
+Since we insert at least $\Theta(n)$ elements to reach the limit from a freshly
+rebuilt structure, this amortizes to $\O(1)$ amortized time per an insertion,
+as we can rebuild an array in time $\O(n)$.
+
+Another example of such structure is an $y$-fast tree. It is parametrized by
+block size required to be $\Theta(\log n)$ for good time complexity. If we let
+$n$ change enough such that $\log n$ changes asymptotically, everything breaks.
+We can save this by rebuilding the tree before this happens $n$ changes enough,
+which happens after $\Omega(n)$ operations.
+
+Consider a data structure where instead of proper deletion of elements we just
+replace them with ``tombstones''. When we run a query, we ignore them. After
+enough deletions, most of the structure becomes filled with tombstones, leaving
+too little space for proper elements and slowing down the queries. Once again,
+the fix is simple -- once at least $n/2$ of elements are tombstones, we rebuild
+the structure. To reach $n/2$ tombstones we need to delete $\Theta(n)$
+elements. If a rebuild takes $\Theta(n)$ time, this again amortizes.
+
+\subsection{Local rebuilding}
+
+In many cases, it is enough to rebuild just a part of the structure to fix
+local problems. Once again, if a structure part has size $k$, we want to have
+done at least $\Theta(k)$ operations since its last rebuild. This then allows
+the rebuild to amortize into other operations.
+
+One of such structures is a binary search tree. We start with a perfectly
+balanced tree. As we insert or remove nodes, the tree structure degrades over
+time. With a particular choice of operations, we can force the tree to
+degenerate into a long vine, having linear depth.
+
+To fix this problem, we define a parameter $1/2 < \alpha < 1$ as a {\I balance
+limit}. We use it to determine if a tree is balanced enough.
+
+\defn{
+	A node $v$ is balanced, if for each its child $c$ we have $s(c) \leq
+	\alpha s(v)$. A tree $T$ is balanced, if all its nodes are balanced.
+}
+
+\lemma{
+	If a tree with $n$ nodes is balanced, then its height is
+	$\O(\log_{1/\alpha} n)$.
+}
+
+\proof{
+	Choose an arbitrary path from the root to a leaf and track the node
+	sizes. The root has size $n$. Each subsequent node has its size at most
+	$\alpha n$. Once we reach a leaf, its size is 1. Thus the path can
+	contain at most $\log_{1/\alpha} n$ edges.
+}
+
+Therefore, we want to keep the nodes balanced between any operations. If any
+node becomes unbalanced, we take the highest such node $v$ and rebuild its
+subtree $T(v)$ into a perfectly balanced tree.
+
+For $\alpha$ close to $1/2$ any balanced tree closely resembles a perfectly
+balanced tree, while with $\alpha$ close to 1 the tree can degenerate much
+more. This parameter therefore controls how often we cause local rebuilds
+and the tree height. The trees defined by this parameter are called
+$BB[\alpha]$ trees.
+
+Rebuilding a subtree $T(v)$ takes $\O(s(v))$ time, but we can show that this
+happens infrequently enough. Both insertion and deletion change the amount of
+nodes by one. To inbalance a root of a perfectly balanced trees, and thus cause
+a rebuild, we need to add or remove at least $\Theta(n)$ vertices. We will
+show this more in detail for insertion.
+
+\theorem{
+	Amortized time complexity of the \alg{Insert} operation is $\O(\log
+	n)$, with constant factor dependent on $\alpha$.
+}
+
+\proof{
+We define a potential as a sum of ``badness'' of all tree nodes. Each node will
+contribute by the difference of sizes of its left and right child. To make
+sure that perfectly balanced subtrees do not contribute, we clamp difference of
+1 to 0.
+$$\eqalign{
+	\Phi &:= \sum_v \varphi(v), \quad\hbox{where} \cr
+	\varphi(v) &:= \cases{
+		\left\vert s(\ell(v)) - s(r(v)) \right\vert & if at least~2, \cr
+		0 & otherwise. \cr
+	} \cr
+}$$
+When we add a new leaf, the size of all nodes on the path to the root increases
+by 1. The contribution to the potential is therefore at most 2.
+
+We spend $\O(\log n)$ time on the operation. If all nodes stay balanced and
+thus no rebuild takes place, potential increases by $\O(\log n)$, resulting in
+amortized time $\O(\log n)$.
+
+Otherwise, consider the highest unbalanced node $v$. Without loss of
+generality, the invariant was broken for its left child $l(v)$, thus
+$s(l(v)) > \alpha \cdot s(v)$. Therefore, the size of the other child is small:
+$s(r(v)) < (1 - \alpha) \cdot s(v)$. The contribution of $v$ is therefore
+$\varphi(v) > (2\alpha - 1) \cdot s(v)$.
+
+After rebuilding $T(v)$, the subtree becomes perfectly balanced. Therefore for
+all nodes $u \in T(v)$ the contribution $\varphi(u)$ becomes zero. All other
+contributions stay the same. Thus, the potential decreases by at least
+$(2\alpha - 1) \cdot s(v) \in \Theta(s(v))$. By multiplying the potential by a
+suitable constant, the real cost $\Theta(s(v))$ of rebuild will be fully
+compensated by the potential decrease, yielding zero amortized cost.
+}
+
 \endchapter