Geometry: 1-D range trees

f3ff9446 · Martin Mareš · 32e95fcf · f3ff9446 · f3ff9446
Commit f3ff9446 authored 6 years ago by Martin Mareš
--- a/07-geom/geom.tex
+++ b/07-geom/geom.tex
@@ -39,7 +39,102 @@ on points in~$\R^d$

 \section{Range queries in one dimension}

-TODO
+We show the basic techniques on one-dimensional range queries. The simplest
+solution is to use a~sorted array. Whenever we are given a~query interval $[a,b]$,
+we can locate its endpoints in the array by binary search. Then we either enumerate
+the items between the endpoints or we count them by subtracting indices of the
+endpoints. We only have to be careful and check whether the endpoints lies at an
+item or in a~gap between items. Overall, we can answer any query in $\O(\log n + p)$,
+where $n$~is the number of items and $p$~the number of points enumerated. The structure
+can be built in time $\O(n\log n)$ by sorting the items and it is stored in $\O(n)$ space.
+
+Another solution uses binary search trees. It is more complicated, but more
+flexible. It can be made dynamic and it can also answer aggregation queries.
+
+\defn{Let $T$~be a~binary search tree with real-valued keys. For each node~$v$,
+we define the set $\intr(v)$ called the \em{interval of~$v$.} It contains
+all real numbers whose search visits~$v$.}
+
+\obs{
+We can see that the sets have the following properties:
+
+\tightlist{o}
+\:$\intr(\<root>)$ is the whole set~$\R$.
+\:If $v$~is a~node with key~$\key(v)$, left child $\ell(v)$ and right child $r(v)$, then:
+	\tightlist{o}
+	\:$\intr(\ell(v)) = \intr(v) \cap (-\infty, \key(v))$
+	\:$\intr(r(v)) = \intr(v) \cap (\key(v), +\infty)$
+	\endlist
+\:By induction, $\intr(v)$ is always an~open interval.
+\:All keys in the subtree of~$v$ lie in $\intr(v)$.
+\:The definition of $\intr(v)$ applies to external nodes, too. The intervals
+  obtained by cutting the real line to parts at the keys in internal nodes
+  are exactly the intervals assigned to external nodes.
+\endlist
+}
+
+This suggests a~straightforward recursive algorithm for answering range queries.
+
+\algo{IntQuery}$(v,Q)$
+\algin a~root of a~subtree~$v$, query range~$Q$
+\:If $v$~is external, return.
+\:If $\intr(v) \subseteq Q$, report the whole subtree rooted at~$v$ and return.
+\:If $\key(v) \in Q$, report the item at~$v$.
+\:$Q_\ell \= Q \cap \intr(\ell(v))$, $Q_r \= Q \cap \intr(r(v))$
+\:If $Q_\ell \ne \emptyset$: call $\alg{IntQuery}(\ell(v), Q_\ell)$
+\:If $Q_r \ne \emptyset$: call $\alg{IntQuery}(r(v), Q_r)$
+\endalgo
+
+Let us analyze time complexity of this algorithm now.
+
+\lemma{
+If the tree is balanced, \alg{IntQuery} called on its root visits $\O(\log
+n)$ nodes and subtrees.
+}
+
+\proof
+Let $Q=[\alpha,\beta]$ be the query interval. Let $a$ and~$b$ the tree nodes
+(internal or external) where search for $\alpha$ and~$\beta$ ends. We
+denote the lowest common ancestor of~$a$ and~$b$ by~$p$.
+
+Whenever we enter a~node~$v$ with some interval $\intr(v)$, the key $\key(v)$
+splits the interval to two parts, corresponding to $\intr(\ell(v))$ and
+$\intr(r(v))$.
+
+On the path from the root to~$p$, $Q$~always lies in one of these parts and we
+recurse on one child. In some cases, the current key lies in~$Q$, so we report it.
+
+When we enter the common ancestor~$p$, the range~$Q$ lives in both parts, so we
+report $\key(p)$ and recurse on both parts.
+
+On the ``left path'' from~$p$ to~$a$, we encounter two situations. Either $Q$~lies
+solely in the right part, so we recurse on it. Or $Q$~crosses $\key(v)$, so we recurse
+on the left part and report the whole right subtree. Again, we report $\key(v)$
+if it lies in~$Q$.
+
+The ``right path'' from~$p$ to~$b$ behaves symetrically: we recurse on the right part and possibly report
+the whole left subtree and/or the current key.
+
+Since all paths contain $\O(\log n)$ nodes together, we visit $\O(\log n)$ nodes
+and report $\O(\log n)$ nodes and $\O(\log n)$ subtrees.
+\qed
+
+\corr{
+An~enumeration query is answered in time $\O(\log n + p)$, where~$p$ is the number
+of items reported. If we precompute sizes of all subtrees, a~counting query takes
+$\O(\log n)$ time. Aggregate queries can be answered if we precompute aggregate
+answers for all subtrees and combine them later.
+The structure can be built in $\O(n\log n)$ time using the algorithm
+for constructing perfectly balanced trees. It takes $\O(n)$ memory.
+}
+
+This query algorithm is compatible with most methods for balancing binary search trees.
+The interval $\intr(v)$ need not be stored in the nodes --- it can be computed on the
+fly when traversing the tree from the root. The subtree sizes or aggregate answers can
+be easily updated when rotating an~edge: only the values in the endpoints of the edge
+are affected and they can be recomputed in constant time from values in their children.
+This way we can obtain a~dynamic range tree with \alg{Insert} and \alg{Delete}
+in $\O(\log n)$ time.

 \section{Multi-dimensional search trees (k-d trees)}


--- a/tex/adsmac.tex
+++ b/tex/adsmac.tex
@@ -267,6 +267,10 @@
 % C++
 \def\Cpp{C{\tt ++}}

+% Various operators and functions
+\def\intr{{\rm int}}
+\def\key{{\rm key}}
+
 % Tabulka operací datové struktury
 \def\optable#1{$$
 \def\cr{\crcr\noalign{\smallskip}}