Skip to content
Snippets Groups Projects
Commit f3ff9446 authored by Martin Mareš's avatar Martin Mareš
Browse files

Geometry: 1-D range trees

parent 32e95fcf
No related branches found
No related tags found
No related merge requests found
......@@ -39,7 +39,102 @@ on points in~$\R^d$
\section{Range queries in one dimension}
TODO
We show the basic techniques on one-dimensional range queries. The simplest
solution is to use a~sorted array. Whenever we are given a~query interval $[a,b]$,
we can locate its endpoints in the array by binary search. Then we either enumerate
the items between the endpoints or we count them by subtracting indices of the
endpoints. We only have to be careful and check whether the endpoints lies at an
item or in a~gap between items. Overall, we can answer any query in $\O(\log n + p)$,
where $n$~is the number of items and $p$~the number of points enumerated. The structure
can be built in time $\O(n\log n)$ by sorting the items and it is stored in $\O(n)$ space.
Another solution uses binary search trees. It is more complicated, but more
flexible. It can be made dynamic and it can also answer aggregation queries.
\defn{Let $T$~be a~binary search tree with real-valued keys. For each node~$v$,
we define the set $\intr(v)$ called the \em{interval of~$v$.} It contains
all real numbers whose search visits~$v$.}
\obs{
We can see that the sets have the following properties:
\tightlist{o}
\:$\intr(\<root>)$ is the whole set~$\R$.
\:If $v$~is a~node with key~$\key(v)$, left child $\ell(v)$ and right child $r(v)$, then:
\tightlist{o}
\:$\intr(\ell(v)) = \intr(v) \cap (-\infty, \key(v))$
\:$\intr(r(v)) = \intr(v) \cap (\key(v), +\infty)$
\endlist
\:By induction, $\intr(v)$ is always an~open interval.
\:All keys in the subtree of~$v$ lie in $\intr(v)$.
\:The definition of $\intr(v)$ applies to external nodes, too. The intervals
obtained by cutting the real line to parts at the keys in internal nodes
are exactly the intervals assigned to external nodes.
\endlist
}
This suggests a~straightforward recursive algorithm for answering range queries.
\algo{IntQuery}$(v,Q)$
\algin a~root of a~subtree~$v$, query range~$Q$
\:If $v$~is external, return.
\:If $\intr(v) \subseteq Q$, report the whole subtree rooted at~$v$ and return.
\:If $\key(v) \in Q$, report the item at~$v$.
\:$Q_\ell \= Q \cap \intr(\ell(v))$, $Q_r \= Q \cap \intr(r(v))$
\:If $Q_\ell \ne \emptyset$: call $\alg{IntQuery}(\ell(v), Q_\ell)$
\:If $Q_r \ne \emptyset$: call $\alg{IntQuery}(r(v), Q_r)$
\endalgo
Let us analyze time complexity of this algorithm now.
\lemma{
If the tree is balanced, \alg{IntQuery} called on its root visits $\O(\log
n)$ nodes and subtrees.
}
\proof
Let $Q=[\alpha,\beta]$ be the query interval. Let $a$ and~$b$ the tree nodes
(internal or external) where search for $\alpha$ and~$\beta$ ends. We
denote the lowest common ancestor of~$a$ and~$b$ by~$p$.
Whenever we enter a~node~$v$ with some interval $\intr(v)$, the key $\key(v)$
splits the interval to two parts, corresponding to $\intr(\ell(v))$ and
$\intr(r(v))$.
On the path from the root to~$p$, $Q$~always lies in one of these parts and we
recurse on one child. In some cases, the current key lies in~$Q$, so we report it.
When we enter the common ancestor~$p$, the range~$Q$ lives in both parts, so we
report $\key(p)$ and recurse on both parts.
On the ``left path'' from~$p$ to~$a$, we encounter two situations. Either $Q$~lies
solely in the right part, so we recurse on it. Or $Q$~crosses $\key(v)$, so we recurse
on the left part and report the whole right subtree. Again, we report $\key(v)$
if it lies in~$Q$.
The ``right path'' from~$p$ to~$b$ behaves symetrically: we recurse on the right part and possibly report
the whole left subtree and/or the current key.
Since all paths contain $\O(\log n)$ nodes together, we visit $\O(\log n)$ nodes
and report $\O(\log n)$ nodes and $\O(\log n)$ subtrees.
\qed
\corr{
An~enumeration query is answered in time $\O(\log n + p)$, where~$p$ is the number
of items reported. If we precompute sizes of all subtrees, a~counting query takes
$\O(\log n)$ time. Aggregate queries can be answered if we precompute aggregate
answers for all subtrees and combine them later.
The structure can be built in $\O(n\log n)$ time using the algorithm
for constructing perfectly balanced trees. It takes $\O(n)$ memory.
}
This query algorithm is compatible with most methods for balancing binary search trees.
The interval $\intr(v)$ need not be stored in the nodes --- it can be computed on the
fly when traversing the tree from the root. The subtree sizes or aggregate answers can
be easily updated when rotating an~edge: only the values in the endpoints of the edge
are affected and they can be recomputed in constant time from values in their children.
This way we can obtain a~dynamic range tree with \alg{Insert} and \alg{Delete}
in $\O(\log n)$ time.
\section{Multi-dimensional search trees (k-d trees)}
......
......@@ -267,6 +267,10 @@
% C++
\def\Cpp{C{\tt ++}}
% Various operators and functions
\def\intr{{\rm int}}
\def\key{{\rm key}}
% Tabulka operací datové struktury
\def\optable#1{$$
\def\cr{\crcr\noalign{\smallskip}}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment