Skip to content
Snippets Groups Projects
Commit 93cdcbe8 authored by Martin Mareš's avatar Martin Mareš
Browse files

Geometry: Range trees (partial)

parent db3eeb02
No related branches found
No related tags found
No related merge requests found
...@@ -29,13 +29,13 @@ Queries are asked about all objects which lie within a~given \em{region} ...@@ -29,13 +29,13 @@ Queries are asked about all objects which lie within a~given \em{region}
We might want to \em{enumerate} all objects within the region, We might want to \em{enumerate} all objects within the region,
or just \em{count} them enumerating all of them. or just \em{count} them enumerating all of them.
enumerating all of them. If there is a~value associated with each If there is a~value associated with each
object, we can ask for a~sum or a~maximum of values of all objects object, we can ask for a~sum or a~maximum of values of all objects
within the range --- this is generally called an \em{aggregation} within the range --- this is generally called an \em{aggregation}
query. query.
In this chapter, we will consider the simple case of range queries In this chapter, we will consider the simple case of range queries
on points in~$\R^d$ on points in~$\R^d$.
\section[range1d]{Range queries in one dimension} \section[range1d]{Range queries in one dimension}
...@@ -75,20 +75,20 @@ We can see that the sets have the following properties: ...@@ -75,20 +75,20 @@ We can see that the sets have the following properties:
This suggests a~straightforward recursive algorithm for answering range queries. This suggests a~straightforward recursive algorithm for answering range queries.
\algo{IntQuery}$(v,Q)$ \algo{RangeQuery}$(v,Q)$
\algin a~root of a~subtree~$v$, query range~$Q$ \algin a~root of a~subtree~$v$, query range~$Q$
\:If $v$~is external, return. \:If $v$~is external, return.
\:If $\intr(v) \subseteq Q$, report the whole subtree rooted at~$v$ and return. \:If $\intr(v) \subseteq Q$, report the whole subtree rooted at~$v$ and return.
\:If $\key(v) \in Q$, report the item at~$v$. \:If $\key(v) \in Q$, report the item at~$v$.
\:$Q_\ell \= Q \cap \intr(\ell(v))$, $Q_r \= Q \cap \intr(r(v))$ \:$Q_\ell \= Q \cap \intr(\ell(v))$, $Q_r \= Q \cap \intr(r(v))$
\:If $Q_\ell \ne \emptyset$: call $\alg{IntQuery}(\ell(v), Q_\ell)$ \:If $Q_\ell \ne \emptyset$: call $\alg{RangeQuery}(\ell(v), Q_\ell)$
\:If $Q_r \ne \emptyset$: call $\alg{IntQuery}(r(v), Q_r)$ \:If $Q_r \ne \emptyset$: call $\alg{RangeQuery}(r(v), Q_r)$
\endalgo \endalgo
Let us analyze time complexity of this algorithm now. Let us analyze time complexity of this algorithm now.
\lemma{ \lemma{
If the tree is balanced, \alg{IntQuery} called on its root visits $\O(\log If the tree is balanced, \alg{RangeQuery} called on its root visits $\O(\log
n)$ nodes and subtrees. n)$ nodes and subtrees.
} }
...@@ -163,7 +163,7 @@ space. ...@@ -163,7 +163,7 @@ space.
We can answer 2-d range queries similarly to the 1-d case. To each node~$v$ of the We can answer 2-d range queries similarly to the 1-d case. To each node~$v$ of the
tree, we can assign a~2-d interval $\intr(v)$ recursively. This again generates a~hierarchy tree, we can assign a~2-d interval $\intr(v)$ recursively. This again generates a~hierarchy
of nested intervals, so the \alg{IntQuery} algorithm works there, too. of nested intervals, so the \alg{RangeQuery} algorithm works there, too.
However, 2-d range queries can be very slow in the worst case: However, 2-d range queries can be very slow in the worst case:
\lemma{Worst-case time complexity of range queries in a~2-d tree is $\Omega(\sqrt n)$.} \lemma{Worst-case time complexity of range queries in a~2-d tree is $\Omega(\sqrt n)$.}
...@@ -199,6 +199,81 @@ Dynamization is non-trivial and we will not show it. ...@@ -199,6 +199,81 @@ Dynamization is non-trivial and we will not show it.
\section{Multi-dimensional range trees} \section{Multi-dimensional range trees}
The $k$-dimensional search trees were simple, but slow in the worst case.
There is a~more efficient date structure: the \em{multi-dimensional range tree,}
which has poly-logarithmic query complexity, if we are willing to use
super-linear space.
\subsection{2-dimensional range trees}
For simplicity, we start with a~static 2-dimensional version
and we will assume that no two points have the same $x$~coordinate.
The 2-d range tree will consist of multiple instances of a~1-d range tree,
which we built in section \secref{range1d} --- it can be a~binary search tree
with range queries, but in the static case even a~sorted array suffices.
First we create an~$x$-tree, which is a~1-d range over the $x$~coordinates
of all points stored in the structure. Each node contains a~single point.
Its subtree corresponds to an~open interval of $x$~coordinates, that is
a~\em{band} in~$\R^2$ (an~open rectangle which is vertically unbounded).
For every band, we construct a~$y$-tree containing all points in the band
ordered by the $y$~coordinate.
If the $x$-tree is balanced, every node lies in $\O(log n)$ subtrees.
So every point lies in $\O(\log n)$ bands and the whole structure takes
$\O(n\log n)$ space: $\O(n)$ for the $x$-tree, $\O(n\log n)$ for all
$y$-trees.
We can build the 2-d tree recursively. First we create two lists of points:
one sorted by the $x$~coordinate, one by~$y$. Then we construct the $x$-tree.
We find the point with median~$x$ coordinate in constant time. This point
becomes the root of the $x$-tree. We recursively construct the left subtree
from points which less than median $x$~coordinate --- we can construct the
corresponding sub-lists of both the $x$ and~$y$ list in $\O(n)$ time. We
construct the right subtree similarly. Finally, we build the $y$-tree
for the root: it contains all the points and we can build it from the $y$-sorted
array in $\O(n)$ time.
The whole building algorithm requires linear time per sub-problem, which
sums to $\O(n)$ over one level of the $x$-tree. Since the $x$-tree is
logarithmically high, it makes $\O(n\log n)$ for the whole construction.
Now we describe how to answer a~range query for $[x_1,x_2] \times [y_1,y_2]$.
First we let the $x$-tree answer a~range query for $[x_1,x_2]$. This gives us
a~union of $\O(\log n)$ points and bands which disjointly cover $[x_1,x_2]$.
For each point, we test if its $y$~coordinate lies in $[y_1,y_2]$. For each
band, we ask the corresponding $y$-tree for points in the range $[y_1,y_2]$.
We spend $\O(\log n)$ time in the $x$-tree, $\O(\log n)$ time to process the
individual points, $\O(\log n)$ in each $y$-tree, and $\O(1)$ per point
reported. Put together, this is $\O(\log^2 n + p)$ if $p$~points are
reported ($p=0$ for a~counting query).
\subsection{Handling repeated coordinates}
We left aside the case of multiple points with the same $x$~coordinate.
This can be handled by attaching another $y$-tree to each $x$-tree node,
which contains nodes sharing the same $x$~coordinate. That is,
$x$-tree nodes correspond to distinct $x$~coordinates and each
has two $y$-trees: one for its own $x$~coordinate, one for the open
interval of $x$~coordinates covering its subtree. This way, we can
perform range queries for both open and closed ranges.
Time complexity of \alg{Build} stays asymptotically the same: the maximum
number of $y$-trees containing a~given point increases twice, so it is
still $\O(\log n)$. Similarly for range queries: we query at most twice
as much $y$-trees.
\subsection{Multi-dimensional generalization}
TODO
\subsection{Dynamization}
TODO
\subsection{Fractional cascading}
TODO TODO
\endchapter \endchapter
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment