diff --git a/50-graphs/graphs.tex b/50-graphs/graphs.tex index 355a07b2d5c8f365d56063ba8d4f13af67d0d84e..183e1133bd35844cf44a3af096f7c7778da6da92 100644 --- a/50-graphs/graphs.tex +++ b/50-graphs/graphs.tex @@ -13,10 +13,10 @@ \chapter[graphs]{Representation of graphs} In this chapter we will peek into the area of data structures for representation of -graphs. Our ultimate goal is to design a data structure that represents a forest with -weighted vertices and allows efficient path queries (e.g. what is the cheapest vertex on the -path between $u$ and $v$) along with cost and structural updates. - +graphs. Our ultimate goal is to design a data structure that represents efficiently a forest. +Not just any forest however, we would want to have a cost function on the set of vertices +and our structure should support efficient path queries (e.g., what the cheapest vertex on the +path between $u$ and $v$ is) as well as updates of graph structure and costs. Let us define the problem more formally. We wish to represent a forest $F = (V, E)$, where each vertex~$v$ has cost~$c(v)\in\R$.\foot{We could also had weighted edges instead.} We would like to support following operations: @@ -30,21 +30,21 @@ v$;\foot{Generally, we can use any associative operation instead of minimum.} \section[path]{Static path} -As a warm-up we build a data structure for $F$ being a static path, without structural -updates. This will also be an important building block for the more general case. +As a warm-up we build a data structure for $F$ equal to a path $v_1, \dots, v_n$, +without structural updates. +This will also be an important building block for the more general case. -Let us denote the vertices $v_1, \dots, v_n$ according to the position on the path and let -us denote $c_i = c(v_i)$. We build an range +Let us denote $c_i = c(v_i)$. We build a range tree~$T$ over the costs $c_1, \dots, c_n$. That is, $T$ is a complete binary tree with -$c_1,\dots c_n$ in its leaves (in this order) and inner nodes contain the minimum of their -children. Note that each node represents a subpath of~$F$ with leaves being the single -vertices. +$c_1, \dots, c_n$ as labels in its leaves (in this order) while the label of inner nodes is +the minimum of their children's labels. Note that each node represents a subpath of~$F$ -- +root represents the whole path and leaves represent single vertices. -\figure{range-tree.pdf}{}{An example of a range tree for path on eight vertices. -Marked subtrees cover the subpath~$2\to 6$.} +\figure{range-tree.pdf}{}{An example of a range tree for a path on eight vertices. +Marked subtrees cover the subpath~$2 \to 6$.} \theorem{Static path representation via range tree can perform \em{path query}, -\em{point update} +\em{point update}, and \em{path update} in $\O(\log n)$ time. } @@ -59,33 +59,33 @@ from root of~$T$ to the leaf~$c_i$, so it takes $\O(\log n)$ time. The path updates are more tricky. As in the path query, we can decompose the update to $\O(\log n)$ subtrees. But we cannot afford to recalculate the values in these subtrees -directly as they can contain $\Theta(n)$ nodes. But we can be lazy and let others do the -work for us. - -Instead of recalculating the whole subtree, we put a \em{mark} into the root along with -the value of~$\delta$. The mark indicates ``everything below should be increased by -$\delta$''. Whenever an operation touches node during a top-down traversal, it -checks for the mark. If the node is marked, we update value in the node according to the -mark and move the mark down to both children. If the children are already marked, we -simply add the new mark to the existing one. - -This way, other operations can work as if there were no marks and path updates can be -performed in~$\O(\log n)$ time. Note that this lazy approach requires other operations to +directly as they can contain $\Theta(n)$ nodes. We can employ lazy propagation of updates +from the top of the tree down to the leaves however. + +In that setup nodes contain additional \em{offset} field. Offset indicates that all labels in +the corresponding subtree were changed by this value, but the update was not written there yet. +Instead of recalculating the whole subtree, we increase offset at its root by the value of~$\delta$. +Whenever an operation touches node during a top-down traversal, it +checks the offset. If the offset is non-zero, we reset it to zero, adjust the label and +update offset in both children by the same value. + +This way, when the traversal reaches a leaf, all updates are reflected in its label and no there +is no change in complexity of operations. +Note that this lazy approach requires other operations to always traverse the tree top-down in order to see correct values in the nodes. -\figure{lazy-update.pdf}{}{Example of range tree traversal with marks. We wish to travel -from $x$ to $z$. The node~$x$ is marked, with $\delta = +4$, so we need to increase value -stored in~$x$ by~4 and transfer mark to both children of~$x$. Then we can visit~$x$ and -move along to~$y$. Node~$y$ is also marked now, so we update~$y$ and transfer mark to both -children. Left child of~$y$ was already marked by $+3$, so we have change the mark to +\figure{lazy-update.pdf}{}{Example of a range tree traversal with offsets. We wish to travel +from $x$ to $z$. The node~$x$ is has non-zero offset $+4$, so we need to increase value +stored in~$x$ by~4 and transfer this offset to both children of~$x$. Then we can visit~$x$ and +move along to~$y$. Node~$y$ is also offset now, so we update~$y$ and transfer mark to both +children. Left child of~$y$ had already been offset by $+3$, so we have change the offset to $+7$.} \qed \section[hld]{Heavy-light decomposition} -Now we are ready build data structure for static trees using \em{heavy-light -decomposition}. We assume our tree $F$ is rooted and we orient all edges -up, towards the root. +Now we are ready to build data structure for static trees using \em{heavy-light +decomposition}. We assume our tree $F$ is rooted all edges are directed towards the root. \defn{ Let~$F$ be a rooted tree. For any vertex~$v$ we define $s(v)$ to be the size of subtree @@ -107,8 +107,8 @@ This gives us the decomposition of the tree into heavy paths that are connected edges. The decomposition can be easily found using depth-first search in linear time. -\figure{heavy-light.pdf}{}{Example of heavy-light decomposition. Top part shows a tree -with heavy paths marked by thick lines. Numbers in parenthesis show the value of $s(v)$ +\figure{heavy-light.pdf}{}{Example of a heavy-light decomposition. Top part shows a tree +with heavy paths marked by thick lines. Numbers in parentheses show the value of $s(v)$ (ones are omitted). Bottom part shows the tree after compression of non-trivial heavy paths.} @@ -119,15 +119,17 @@ path queries and updates. For each vertex~$v$ we store an identifier of the heavy path it lies on and we also store the position of~$v$ on that path. For each heavy path~$H$ we store the light edge that -leads from the top of the path and connects~$H$ to the rest of the tree. These information +leads from the top of the path and connects~$H$ to the rest of the tree. All this information can be precalculated in~$\O(n)$ time. To answer $\LCA(x,y)$ we start at both~$x$ and~$y$ and we jump along heavy paths up, towards the root. Once we discover lowest common heavy path, we compare position of ``entry-points'' to decide which one of them is LCA. We have to traverse $\O(\log n)$ -light edges and we can jump over a heavy path in constant time, thus we spend $\O(\log n)$ +light edges and we can jump over a heavy path in constant time, thus spending $\O(\log n)$ time in total. +We should remark that this algorithm for calculation of LCA is not optimal in general. + \subsection{Path queries and updates} Let us return to the original problem of path queries and updates. The idea is @@ -146,7 +148,7 @@ be divided into two top-down paths at the lowest common ancestor of $x$ and $y$. \qed We represent each heavy path using the range tree structure for static path from the -previous chapter. The root of each range tree will also store the light edge that leads up +previous section. The root of each range tree will also store the light edge that leads up from the top of the path and connects it to the rest of the tree. We also need to store the extra information used in the LCA algorithm. @@ -168,7 +170,7 @@ Let us analyze the partitioning of a path in a bit more detail: \obs{ When we partition a path into $\O(\log n)$ heavy subpaths, all of the subpaths, with one -exception, are a prefix or a suffix of heavy path. +exception, are a prefix or a suffix of a full heavy path. } We can use this observation to make path queries faster but at the cost of keeping the @@ -183,15 +185,15 @@ time. \section[linkcut]{Link-cut trees} -Link-cut trees are dynamic version of the heavy-light decomposition. They allow us to +Link-cut trees can be seen as a dynamic version of the heavy-light decomposition. They allow us to change structure of the represented forest by either linking two trees or by cutting an edge inside a tree. Link-cut trees were introduced in a paper by Sleator and Tarjan in -1982. However, we will show later version from 1985, also by Sleator and Tarjan, that uses splay -trees instead of original biased binary trees. Although it achieves the time complexity -only in amortized case, it is significantly easier to analyze. +1982. We will show the structure in its variant the authors published in 1985 that uses splay +trees instead of original biased binary tree. +Although it achieves the time complexity only in amortized case, it is significantly easier to analyze. -Link-cut tree represents a forest $F$ of \em{rooted} trees; each edge is oriented towards the -respective root. It supports following operations: +Link-cut tree represents a forest $F$ of \em{rooted} trees; each edge is directed towards the +respective root. It supports the following operations: \list{o} \: Structural queries: \tightlist{-} @@ -225,7 +227,7 @@ tree is decomposed into a system of paths. Instead of heavy and light edges we h structure of the tree but by the history of the data structure. The only requirement is that every vertex has at most one incoming fat edge. This requirement assures that the tree can be decomposed into a system of fat paths interconnected by thin edges. Unlike -heavy-light decomposition, we don't have bound on the number of thin edges on the path $v +heavy-light decomposition, we don't have any bound on the number of thin edges on the path $v \to \Root(v)$. In fact, it is possible that there are only thin edges in the tree! Nevertheless, we will show that everything works out in the amortized case. @@ -244,23 +246,18 @@ a constant number of operations on a fat path: Conceptually, $\Expose(v)$ is straightforward. At the beginning, -we turn a fat edge below~$v$ into thin edge (if there was one) and make~$v$ the endpoint -of the fat path. -Now, assume~$v$ lies on a fat path~$A$. We start +we turn the fat edge below~$v$ (if such edge exists) into a thin edge making~$v$ the endpoint +of its fat path. +Now $v$ lies on a fat path~$A$ (of length one or greater). We start at~$v$ and jump along~$A$ to its top~$t$. Unless $t$ is the root of the tree (which means -we are done), $t$ is connected to a fat path~$B$ via thin edge $(t, p)$, see +we are done), $t$ is connected to a fat path~$B$ via a thin edge $(t, p)$, see Figure~\figref{expose-idea}. We cut~$B$ by turning fat edge below~$p$ into a thin edge. -Then we join top half of~$B$ with~$A$ by making edge~$(t, p)$ fat. This is one step of the -$\Expose$. Now -we jump to the top of the newly created fat path and repeat the whole process. +Then we join top segment of~$B$ with~$A$ by making edge~$(t, p)$ fat. This is one step of the +$\Expose$ operation that repeats until $v$ is connected to the root via a fat path. \figure[expose-idea]{expose-idea.pdf}{}{One step of $\Expose$ in the thin-fat decomposition.} -\theoremn{Sleator, Tarjan'82}{ -$\Expose$ operation performs $\O(\log n)$ steps amortized. -} - By using a balanced binary tree to represent fat paths, we obtain $\O(\log^2 n)$ amortized time complexity for $\Expose$ and all other operations. But we can do better! In the original paper, Sleator and Tarjan use biased binary trees to @@ -271,9 +268,13 @@ quite technical and complicated. Instead, we use splay trees to represent fat pa yields $\O(\log n)$ amortized complexity of $\Expose$, but with significantly easier analysis. +\theoremn{Sleator, Tarjan'82}{ +$\Expose$ operation performs $\O(\log n)$ steps amortized. +} + \subsection{Intermezzo: Splay trees} -Let us start by a brief recapitulation of splay trees. For more thorough description and -analysis, we refer the reader to the Chapter~\chapref{splay}. +Let us start by a brief recapitulation of splay trees. The reader is encouraged to go through +Chapter~\chapref{splay} first if unfamiliar with them altogether. Splay tree is a self-adjusting binary search tree that uses $\Splay$ operation to rebalance itself. $\Splay(v)$ brings node~$v$ to the root of the tree using double @@ -284,7 +285,7 @@ tree, we need to splay the deepest node we touch. To analyze amortized complexity of a splay tree we use a potential method. Each node~$v$ is assigned an arbitrary \em{weight}~$w(v) > 0$ and we define a \em{size} of~$v$ to be -$s(v)\sum_{u\in T_v} w(u)$, where $T_v$ are the nodes in the subtree rooted at~$v$ +$s(v) = \sum_{u\in T_v} w(u)$, where $T_v$ are the nodes in the subtree rooted at~$v$ (including~$v$). Note that the weights are just for the analysis and the data structure is not aware of their existence in any way. Base on the size we define the \em{rank} of the node as $r(v) = \log s(v)$. Finally, the potential~$\Phi$ of the splay @@ -304,7 +305,7 @@ of them is the application in link-cut trees. \subsection{Representation of fat paths via splay trees} We describe a fat path by a splay tree whose nodes have one-to-one correspondence with the vertices of the fat path. Nodes of the tree have no keys, the ordering is given by the -order of respective vertices on the fat path. That is, left to right inorder traversal of the tree +order of respective vertices on the fat path. That is, left to right in-order traversal of the tree returns the vertices exactly in the order in which they lie on the fat path. We deal with the costs in the same way as in our data structure for a static path. Each @@ -323,12 +324,12 @@ be the first vertex on the path after performing $\Expose(v)$. Thus, we simply s to the root and return the precalculated path minimum in the root's left son. Path update can be performed in a similar way. Like in the case of a static path, we -evaluate updates lazily and we propagate and clean marks during rotations. +evaluate updates lazily and we propagate offsets during rotations. Finally, to implement $\op{Evert}$ we need to reverse the path. We can implement reverse lazily using the same trick as in the path update. Each node of the tree contains a single bit to indicate that the whole subtree below has switched orientation. As with the path -updates, we propagate and clean the bits during rotations. Note that this +updates, we propagate and reset the bits during rotations. Note that this switch is relative to the orientation of the parent node. Thus, in order to know the absolute orientation, we need to always travel in the direction from root to the leaves. @@ -359,21 +360,21 @@ paths and a possible corresponding virtual tree.} \subsection{Implementation of \Expose} At the start of $\Expose(v)$ we need to turn a fat edge below vertex~$v$ into a thin edge -in order to make~$v$ an endpoint of fat path. Let~$v$ lies on a fat path~$A'$ which is +in order to make~$v$ an endpoint of its fat path. Let~$v$ lie on a fat path~$A'$ represented by a splay tree~$T_{A'}$. We splay node~$v$ to the root of~$T_{A'}$. Since~$v$ is the root now, its left subtree contains exactly the vertices that are bellow~$v$ on the path~$A'$. Thus, we just turn left son of~$v$ into a middle son. Now we show a single step of $\Expose(v)$. Vertex~$v$ is the bottom node -of a fat path~$A$ which is connected via a thin edge to the vertex~$p$ on a fat path~$B$. -Both~$A$ and~$B$ are represented by a splay trees $T_A$ and $T_B$ respectively. We assume +of a fat path~$A$ connected via a thin edge to the vertex~$p$ on a fat path~$B$. +Both~$A$ and~$B$ are represented by a splay tree, $T_A$ and $T_B$ respectively. We assume $v$ is the root of~$T_A$, since we splayed it during the initial phase. Similarly to the initial phase of $\Expose(v)$, we need to cut the path~$B$ below the vertex~$p$. So we splay~$p$ and once $p$ is the root of~$T_B$, we turn its left son into a middle son. Now we can join~$A$ and the remnant of~$B$ by making node~$v$ the left son of~$p$. Then we move to the next step, where vertex~$p$ takes the role of vertex~$v$. -In the end, when~$v$ and~$\Root(v)$, are within the same splay tree, we splay~$v$. +In the end, when~$v$ and~$\Root(v)$ are within the same splay tree, we splay~$v$. \figure[expose-real]{expose-real.pdf}{}{One step of $\Expose(v)$ in a virtual tree.} @@ -391,19 +392,20 @@ that tree -- this is the third phase. \figure[expose-phases]{expose-phases.pdf}{}{Three phases of $\Expose(v)$.} -The key phase we need to deal with in our analysis the first phase. Third phase is a +The key phase we need to deal with in our analysis is the first phase. Third phase is a simple splay in a splay tree, which should have have $\O(\log n)$ amortized complexity, unless we break potential in previous phases, of course. Real cost of the second phase can be bounded by -the real cost of the third phase, so the third phase can also pay for the second phase. +the real cost of the third phase, so the cost of the third phase can be multiplied +by a constant to also account for the second phase. However, we somehow have to make sure that swapping thin edges and splay edges does not break the potential stored in the respective trees. The first phase, on the other hand, performs a series of splay operation and each of them -possible costs up to $\Omega(\log n)$ amortized, if we use the basic bound on the +possibly costs up to $\Omega(\log n)$ amortized, if we use the basic bound on the complexity of splay. So, how are we going to achieve the complexity we promised? -The trick is to set weights of nodes in splay trees in a way that the whole virtual tree +The solution is to set weights of nodes in splay trees in a way that the whole virtual tree basically acts as one large splay tree. Recall that in the analysis of the splay tree, we assign an arbitrary non-negative weight to each node of the tree. In our analysis, we set the weight of a node to be @@ -415,26 +417,28 @@ We define a potential~$\Phi$ of the virtual tree~$T$ to be the sum of the potent splay tree. This also means that $\Phi = \sum_{v\in T} r(v)$, where $r(v) = \log(s(v))$ is the rank of~$v$. -Let us star with the third phase. We have only a single splay in a splay tree, so -according to Access lemma we get amortized complexity $\O(r(p_k) - r(v) + 1)$, where the ranks -are just before the third phase. Since size of any node is bound by the total number of +Let us start with the third phase. We only have a single splay operation in a splay tree, so +according to the Access lemma we get amortized complexity $\O(r(p_k) - r(v) + 1)$, where the ranks +are taken just before the third phase. Since size of any node is bound by the total number of vertices, we get amortized complexity of third phase to be $\O(\log n)$. \obs{Turning a proper son into a middle son and vice versa does not change the potential.} -This observation ensures that second phase does not change the potential. Thus, third -phase can completely pay for the second phase. +This observation ensures that second phase does not change the potential. Thus cost of second +phase can be covered by part of the cost of the third. Finally, the dreaded first phase. We perform $k$ splays in $k$ different splay trees. According to the Access lemma, the total amortized cost is at most $$\sum_{i=1}^{k+1} 3(r(r_i) - r(p_{i-1})) + 1,$$ where $r_{k+1}$ is the root of the virtual tree before the first phase. Observe that this -sum telescopes! Since $r_i$ is a middle son of $p_i$, we have $s(r_i) < s(p_i)$ and by -monotonicity the same holds for the ranks. Using this we get that the amortized complexity +sum telescopes (i.e., some terms in one summand cancel out with the next). +Since $r_i$ is a middle son of $p_i$, we have $s(r_i) < s(p_i)$ and by +monotonicity the same holds for the ranks. Using this, we get that the amortized complexity is $\O(r(r_{k+1}) - r(v) + k) = \O(\log n + k)$. This is still not the desired logarithmic complexity as $k$ can be possibly quite large, perhaps $\Theta(n)$. But do not despair, the third phase can save us. Notice that $\O(k)$ can be upper-bounded by the real cost of -the third phase. Thus, third phase will also pay for the $+k$ part of the first phase. +the third phase. Therefore the cost of the third phase can yet again multiply by a constant +to also cover for the $+k$ part of the first phase. There is one last subtle problem we have to deal with. The problem is that structural updates can change the potential of the virtual tree outside of the $\Expose$. @@ -446,13 +450,13 @@ $\op{Link}(u,v)$ is slightly more complicated, since adding a subtree means incr size of some nodes. However, notice that after $\Expose(u)$ the node~$u$ has no right son as it is the last vertex on the fat path and also the root of the virtual tree. Thus, by linking $u$ and $v$ we only increase the size of $u$ and we increase it by at most $n$, -so only need to pay $\O(\log n)$ into the potential. +so we only need to pay $\O(\log n)$ into the potential. \section{Application: Faster Dinic's algorithm} To show a non-trivial application of link-cut trees we describe how they can be used to make faster Dinic's algorithm. Recall that Dinic's algorithm is an algorithm to find a -maximum flow from source vertex~$s$ to a targer vertex~$t$ in network~$G$. We won't +maximum flow from source vertex~$s$ to a target vertex~$t$ in network~$G$. We won't describe the algorithm in full detail here and we focus only on the parts important for our application of link-cut trees. @@ -464,7 +468,7 @@ from~$s$ to~$t$ in the residual network\foot{Residual network is a network conta the edges with non-zero residual capacity, that is, difference between capacity and a flow. Capacity of each edge in residual network is exactly the residual capacity.}. The important property of level graph is that it is acyclic and it can be decomposed into -levels such that there are no edges between vertices in each level, see +levels such that there are no edges between vertices on the same level, see Figure~\figref{level-network} level graph. @@ -472,11 +476,11 @@ graph. Dinic's algorithm starts with a zero flow and in each iteration it finds a blocking flow in the level graph and augments the flow with the blocking flow. It can be shown that we -need~$n$ iterations to find the maximum flow in~$G$. +need up to~$n$ iterations to find the maximum flow in~$G$ in the general case. Traditionally, the search for the blocking flow is done greedily in $\O(nm)$ time, which results in $\O(n^2m)$ complexity of the whole algorithm (construction of the leveled graph -etc can be easily done in $\O(m)$ time per iteration). With link-cut tree, however, we can +etc. can be easily done in $\O(m)$ time per iteration). With link-cut tree, however, we can achieve $\O(m\log n)$ time per iteration. We use link-cut trees with weighted edges -- the cost of an edge is its residual capacity. @@ -491,7 +495,7 @@ tree in~$F$ while $s$ is a root if and only if there are no outgoing edges from~ $F$. Second, each vertex has at most one outgoing edge in~$F$. Finally, $F$ contains only marked edges and if an edge is removed from~$F$, it is never added back to~$F$ again. -The search for blocking flow consist of two steps -- \em{augment step} and \em{expand step}. +The search for a blocking flow consist of two steps -- \em{augment step} and \em{expand step}. If $\Root(s) = t$ there is a path path from $s$ to $t$ in $F$ and we can perform augment step. We find the minimal edge~$e$ on the path $s \to t$ and perform a path update that decreases costs on $s\to t$ by $\Cost(e)$.