Geometry: Range trees (partial)

93cdcbe8 · Martin Mareš · db3eeb02 · 93cdcbe8
Commit 93cdcbe8 authored 6 years ago by Martin Mareš
--- a/07-geom/geom.tex
+++ b/07-geom/geom.tex
@@ -29,13 +29,13 @@ Queries are asked about all objects which lie within a~given \em{region}
 We might want to \em{enumerate} all objects within the region,
 or just \em{count} them enumerating all of them.
-enumerating all of them. If there is a~value associated with each
+If there is a~value associated with each
 object, we can ask for a~sum or a~maximum of values of all objects
 within the range --- this is generally called an \em{aggregation}
 query.
 In this chapter, we will consider the simple case of range queries
-on points in~$\R^d$
+on points in~$\R^d$.
 \section[range1d]{Range queries in one dimension}
@@ -75,20 +75,20 @@ We can see that the sets have the following properties:
 This suggests a~straightforward recursive algorithm for answering range queries.
-\algo{IntQuery}$(v,Q)$
+\algo{RangeQuery}$(v,Q)$
 \algin a~root of a~subtree~$v$, query range~$Q$
 \:If $v$~is external, return.
 \:If $\intr(v) \subseteq Q$, report the whole subtree rooted at~$v$ and return.
 \:If $\key(v) \in Q$, report the item at~$v$.
 \:$Q_\ell \= Q \cap \intr(\ell(v))$, $Q_r \= Q \cap \intr(r(v))$
-\:If $Q_\ell \ne \emptyset$: call $\alg{IntQuery}(\ell(v), Q_\ell)$
+\:If $Q_\ell \ne \emptyset$: call $\alg{RangeQuery}(\ell(v), Q_\ell)$
-\:If $Q_r \ne \emptyset$: call $\alg{IntQuery}(r(v), Q_r)$
+\:If $Q_r \ne \emptyset$: call $\alg{RangeQuery}(r(v), Q_r)$
 \endalgo
 Let us analyze time complexity of this algorithm now.
 \lemma{
-If the tree is balanced, \alg{IntQuery} called on its root visits $\O(\log
+If the tree is balanced, \alg{RangeQuery} called on its root visits $\O(\log
 n)$ nodes and subtrees.
 }
@@ -163,7 +163,7 @@ space.
 We can answer 2-d range queries similarly to the 1-d case. To each node~$v$ of the
 tree, we can assign a~2-d interval $\intr(v)$ recursively. This again generates a~hierarchy
-of nested intervals, so the \alg{IntQuery} algorithm works there, too.
+of nested intervals, so the \alg{RangeQuery} algorithm works there, too.
 However, 2-d range queries can be very slow in the worst case:
 \lemma{Worst-case time complexity of range queries in a~2-d tree is $\Omega(\sqrt n)$.}
@@ -199,6 +199,81 @@ Dynamization is non-trivial and we will not show it.
 \section{Multi-dimensional range trees}
+The $k$-dimensional search trees were simple, but slow in the worst case.
+There is a~more efficient date structure: the \em{multi-dimensional range tree,}
+which has poly-logarithmic query complexity, if we are willing to use
+super-linear space.
+\subsection{2-dimensional range trees}
+For simplicity, we start with a~static 2-dimensional version
+and we will assume that no two points have the same $x$~coordinate.
+The 2-d range tree will consist of multiple instances of a~1-d range tree,
+which we built in section \secref{range1d} --- it can be a~binary search tree
+with range queries, but in the static case even a~sorted array suffices.
+First we create an~$x$-tree, which is a~1-d range over the $x$~coordinates
+of all points stored in the structure. Each node contains a~single point.
+Its subtree corresponds to an~open interval of $x$~coordinates, that is
+a~\em{band} in~$\R^2$ (an~open rectangle which is vertically unbounded).
+For every band, we construct a~$y$-tree containing all points in the band
+ordered by the $y$~coordinate.
+If the $x$-tree is balanced, every node lies in $\O(log n)$ subtrees.
+So every point lies in $\O(\log n)$ bands and the whole structure takes
+$\O(n\log n)$ space: $\O(n)$ for the $x$-tree, $\O(n\log n)$ for all
+$y$-trees.
+We can build the 2-d tree recursively. First we create two lists of points:
+one sorted by the $x$~coordinate, one by~$y$. Then we construct the $x$-tree.
+We find the point with median~$x$ coordinate in constant time. This point
+becomes the root of the $x$-tree. We recursively construct the left subtree
+from points which less than median $x$~coordinate --- we can construct the
+corresponding sub-lists of both the $x$ and~$y$ list in $\O(n)$ time. We
+construct the right subtree similarly. Finally, we build the $y$-tree
+for the root: it contains all the points and we can build it from the $y$-sorted
+array in $\O(n)$ time.
+The whole building algorithm requires linear time per sub-problem, which
+sums to $\O(n)$ over one level of the $x$-tree. Since the $x$-tree is
+logarithmically high, it makes $\O(n\log n)$ for the whole construction.
+Now we describe how to answer a~range query for $[x_1,x_2] \times [y_1,y_2]$.
+First we let the $x$-tree answer a~range query for $[x_1,x_2]$. This gives us
+a~union of $\O(\log n)$ points and bands which disjointly cover $[x_1,x_2]$.
+For each point, we test if its $y$~coordinate lies in $[y_1,y_2]$. For each
+band, we ask the corresponding $y$-tree for points in the range $[y_1,y_2]$.
+We spend $\O(\log n)$ time in the $x$-tree, $\O(\log n)$ time to process the
+individual points, $\O(\log n)$ in each $y$-tree, and $\O(1)$ per point
+reported. Put together, this is $\O(\log^2 n + p)$ if $p$~points are
+reported ($p=0$ for a~counting query).
+\subsection{Handling repeated coordinates}
+We left aside the case of multiple points with the same $x$~coordinate.
+This can be handled by attaching another $y$-tree to each $x$-tree node,
+which contains nodes sharing the same $x$~coordinate. That is,
+$x$-tree nodes correspond to distinct $x$~coordinates and each
+has two $y$-trees: one for its own $x$~coordinate, one for the open
+interval of $x$~coordinates covering its subtree. This way, we can
+perform range queries for both open and closed ranges.
+Time complexity of \alg{Build} stays asymptotically the same: the maximum
+number of $y$-trees containing a~given point increases twice, so it is
+still $\O(\log n)$. Similarly for range queries: we query at most twice
+as much $y$-trees.
+\subsection{Multi-dimensional generalization}
+TODO
+\subsection{Dynamization}
+TODO
+\subsection{Fractional cascading}
 TODO
 \endchapter