@@ -41,32 +41,38 @@ We will explore several easy concepts, which we will later combine to reach opti
...
@@ -41,32 +41,38 @@ We will explore several easy concepts, which we will later combine to reach opti
\subsection{Persistent Stack}
\subsection{Persistent Stack}
We may implement stack as a forward-list, that is every item contains pointer to the next and the structure is represented by a pointer to the head of the forward-list. We can work with earlier versions by pointing to previous heads of the forward-list. Pushing new element would mean creating a new head pointing to head of the version chosen to precede newly this created version. Deleting means making head the element directly following head of the chosen version.
We may implement stack as a forward-list. In that case every item contains pointer to the next and the structure is represented by a pointer to the head of the forward-list. This implementation is naturally persistent. We can work with earlier versions by pointing to previous heads of the forward-list. Pushing new element means creating a new head pointing to head of the version chosen to precede newly this created version. Deleting means making head the element directly following head of the chosen version.
\subsection{Persistence through Path-Copying}
\subsection{Persistence through Path-Copying}
Let us now turn to binary search trees. Binary search tree can be converted into a fully-persistent one rather easily if space complexity is not a concern.
Let us now turn to binary search trees. Binary search tree can be converted into a fully-persistent one rather easily if space complexity is not a concern.
The straight-forward approach is called path-copying. It is based on the observation that most of the tree does not change during an update.
One straight-forward approach to achieve this is called \em{path-copying}. It is based on the observation that most of the tree does not change during an update.
When a new version of the tree should by created by delete or insert, new copies are allocated only for vertices that are changed by the operation and their ancestors.
When a new version of the tree should by created by delete or insert, new copies are allocated only for vertices that are changed by the operation and their ancestors.
This typically means that only path from the inserted/deleted vertex to the root is newly allocated, plus constant number of other vertices.
This typically means that only path from the inserted/deleted vertex to the root is newly allocated, plus constant number of other vertices close to the path (due to rebalancing).
The new vertices carry pointers to the old vertices where subtree rooted at such a vertex is not modified in any way.
Pointers to children in new vertices are also set to the new versions of those children.
For children where new versions do not exist, (for those children that are roots of subtrees which were not modified during the update in any way) pointers to the old versions are used.
Here we tacitly assume that only pointers to children are stored. Updating root in a tree with pointers to parents would involve creating new instances for all nodes in the tree.
Here we tacitly assume that only pointers to children are stored. Updating root in a tree with pointers to parents would involve creating new instances for all nodes in the tree.
This method yields a functional data structure since the old version of the tree was technically not modified and can still be used.
With reasonable variants of binary search trees, achieved time complexity in a tree with $n$ vertices is $\Theta(\log n)$ per operation and $\Theta(\log n)$ memory for insert/delete.
With reasonable variants of binary search trees, achieved time complexity in a tree with $n$ vertices is $\Theta(\log n)$ per operation and $\Theta(\log n)$ memory for insert/delete.
The downside of this method is the increased space complexity. There is no apparent construction that would not increase memory complexity by copying the paths.
The downside of this method is the increased space complexity. There is no apparent construction that would not increase memory complexity by copying the paths.
%TODO: Figure
\subsection{Fat Nodes}
\subsection{Fat Nodes}
To limit memory consumption we may rather choose to let vertices carry their properties for all versions. This in reality means that we must store a collection of changes together with versions when those changes happened. This makes a vertex a kind of dictionary with keys being versions and values the descriptions of changes.
To limit memory consumption we may rather choose to let vertices carry their properties for all versions. This in reality means that we must store a collection of changes together with versions when those changes happened. This makes a vertex some kind of a dictionary with keys being versions and values the descriptions of changes.
When semi-persistence is sufficient, upon arriving at a vertex asking for the state in version $A$, we do the following: We must go through the collection of changes to figure out what the state in version $A$ should be.
When semi-persistence is sufficient, upon arriving at a vertex asking for the state in version $A$, we do the following: We must go through the collection of changes to figure out what the state in version $A$ should be.
We start with the default values and apply all changes that happened earlier than $A$ in chronological order overwriting if there are multiple changes for the same field or pointer.
We start with the default values and apply all changes that happened earlier than $A$ in chronological order overwriting if there are multiple changes for the same field or pointer.
This process yields the correct state of the vertex for version $A$.
This process yields the correct state of the vertex for version $A$.
There are several different options for protocol of changes.
In fact, it might be easier to just copy all of the other fields the vertex possesses as well if one of them changes.
In fact, it might be easier to just copy all of the other fields the vertex possesses as well if one of them changes.
One change will therefore hold new values for all fields and pointers of the vertex.
One change will therefore hold new values for all fields and pointers of the vertex.
By embedding some data structure into a fat node, identification of the correct fields may be faster.
For full-persistence we also need to resolve the issue of how to effectively determining which changes should be applied.
For full-persistence we also need to resolve the issue of how to effectively determining which changes should be applied.
It will be addressed later through introduction of total ordering of all versions.
It will be addressed later through introduction of total ordering of all versions.
...
@@ -144,17 +150,30 @@ Assuming the conditions from the previous proposition, the cost to write changes
...
@@ -144,17 +150,30 @@ Assuming the conditions from the previous proposition, the cost to write changes
\section{Point Localization in a plane}
\section{Point Localization in a plane}
Given a bounded connected subset of a plane partitioned into a finite set of faces, the goal is to respond to queries asking to which face a point $P$ belongs. We limit ourselves to polygonal faces.
Given a bounded connected subset of a plane partitioned into a finite set of faces, the goal is to respond to queries asking to which face a point $p$ belongs.
We limit ourselves to polygonal faces.
One special case of particular importance of this problem is finding the closest point, i.e. when the faces are Voronoi diagrams.
One special case of particular importance of this problem is finding the closest point, i.e. when the faces are Voronoi diagrams.
To build the data structure we will follow a general idea of line-sweeping. We start by sorting vertices of all faces by the first coordinate. We will continually process these vertices in the order of increasing first coordinate. We maintain a sorted list of edges (in a semi-persistent BST) during this processing. The list contains edges that intersect with a sweeping-line parallel to the secondary axis in order of the intersections (sorted by the second coordinate).
To build the data structure we will follow a general idea of line-sweeping.
We start by sorting vertices of all faces by the first coordinate.
We will continually process these vertices in the order of increasing first coordinate.
We maintain a sorted list of edges (in a semi-persistent BST) during this processing.
The list contains edges that intersect with a sweeping-line parallel to the secondary axis in order of the intersections (sorted by the second coordinate).
Initially the list is empty and we set the sweeping line to intersect the first vertex.
Initially the list is empty and we set the sweeping line to intersect the first vertex.
When we start processing a new vertex we imagine moving the sweeping line along the primary axis to intersect with the new vertex. We can easily observe that the order of edges cannot change during this virtual movement. (None of the edges can intersect except in a vertex.) It will happen, however, that edges must be either removed from the list or added to the list. When adding
When we start processing a new vertex we imagine moving the sweeping line along the primary axis to intersect with the new vertex.
We can easily observe that the order of edges cannot change during this virtual movement.
(None of the edges can intersect except in a vertex.)
It will happen, however, that edges must be either removed from the list or added to the list.
We store pointers to the versions created in each vertex in a sorted array.
We store pointers to the versions created in each in vertex in a sorted array.
%TODO: Search
The number of vertices will be denoted $n$, the number of edges is thus bounded by $3n$. This follows from the partitioning being drawing of a planar graph. Complexity is therefore $\O(n \log n)$ for pre-processing and $\O(\log n)$ for one query.
The number of vertices will be denoted $n$, the number of edges is thus bounded by $3n$.
This follows from the partitioning being drawing of a planar graph.
Complexity is therefore $\O(n \log n)$ for pre-processing and $\O(\log n)$ for one query.