Remove extra inverse pointers

476b1194 · Jiří Škrobánek · 1c2faa0d · 476b1194
Commit 476b1194 authored Jul 25, 2021 by Jiří Škrobánek
--- a/201-persist/persist.tex
+++ b/201-persist/persist.tex
@@ -97,95 +97,89 @@ With this approach, we can reach space complexity that is linear with the total
 \section{Pointer Data Structures and Semi-Persistence}

 Following up on the idea of fat nodes, let us limit their size to achieve identification of applicable version in constant time. 
-We will explain this technique for semi-persistence only first. 
-Full persistence is more complicated and requires a few extra tricks.
+We will explain this technique only for semi-persistence. 
+Full persistence requires the use of a few extra tricks and establishing an ordering on the versions.

-A fat node stores a dictionary of standard vertices indexed by versions. 
-We call values of this dictionary slots. 
-The maximum size of this dictionary is set to a constant which is to be determined later. 
-Temporarily, we will allow the capacity of fat vertex to be exceeded. 
+A \em{fat node} stores a collection of standard vertices indexed by versions. 
+We call values of this collection \em{slots}. 
+The maximum size of this collection is set to a constant which is to be determined later. 
+Temporarily, we will allow the capacity of a fat node to be exceeded. 
 This will have to be fixed however, before the ongoing operation finishes.
 By placing a restriction on the size we may circumvent the increased complexity of search within one vertex.
-Instead of copying the vertex, we simply add new slot into the dictionary. 
+Instead of copying the vertex, we simply add a new slot into the collection. 
 Provided the maximum has not been exceeded yet, this insertion of a slot stops the propagation of changes toward the root. 
-The Reader should recall that this was the major weakness of path-copying.
-Because of the limit on size of this dictionary, it may be implemented simply as a linked list.
+The reader should recall that this propagation was the major weakness of path-copying.
+Because of the limit on size of this collection, it may be implemented simply as a linked list.

-Contents of one slot are a version handle, all pointers a vertex would have, then inverse pointers to fat vertices that have slots pointing to this fat vertex for this version and some fields, notably key and value as a bare minimum.
+Contents of one slot are a version handle, all pointers a vertex would have, and some fields, notably key and value as a bare minimum.
 Not all fields need to be versioned. 
-For example balancing information may be stored only for the latest version, i.e. in red-black trees color is only used for balancing and is thus not needed to be persisted for old versions. 
+For example balancing information may be stored only for the latest version, i.e., in a~red-black trees color is only used for balancing and is thus not needed to be persisted for old versions. 
 (Until full-persistence comes into play.)

 One vertex in the original binary search then corresponds to a doubly-linked list of fat nodes.
-When the vertex changes, new state of the vertex is written into a slot of the last fat node in the list. 
-As all slots become occupied in the last fat vertex and it needs to be modified, new fat node is allocated.
+When the vertex changes, a new state of the vertex is written into a slot of the last fat node in the list. 
+As all slots become occupied in the last fat node and it needs to be modified, new fat node is allocated.

 Modifications of a single vertex during one operation are all written to single slot, there is no need for using more slots.

+We will need to be able to determine for any fat node $x$ which other fat nodes have their most recent slot pointing to $x$. 
+This problem is resolved by adding some number of \em{inverse pointers} (in contrast to proper pointers of the binary search tree) to each series of fat nodes of the same vertex. 
+We define an invariant of inverse pointers: If vertex $v$ points to vertex $u$ in the most recent version, then $u$ must contain an inverse pointer to $v$. This invariant is maintained by also changing the inverse pointers whenever some proper pointers change. 
+
 When a new fat node $x$ is allocated, one of its slots is immediately taken. 
-Pointers must be updated in other fat nodes that pointed to the fat node preceding $x$ in the list. 
-This is done either by inserting new slot into them (copying all values from the latest slot and replacing pointers to the predecessor by pointers to $x$). 
-Or by directly updating the pointers if the right version is already present. 
+Pointers must be updated in other fat nodes whose latest slot pointed to the fat node preceding $x$ in the list. 
+This is done by going through vertices pointed to by the inverse pointers and either creating slots (copying all values from the latest slot and replacing pointers to the predecessor by pointers to $x$), or directly updating the pointers if the slot for the right version is already present. 
+Also inverse pointers must be updated to $x$ for vertices pointed to by $x$ in the latest version.

-Recursive allocations may be triggered, which is not a problem if there is only a small amount of them. 
-This is ensured by setting the size of fat vertices suitably. 
+Recursive allocations may be triggered, which is not a problem if there is only a couple of them. 
+This is ensured by setting the size of fat nodes suitably. 
 The order in which these allocations are executed can be arbitrary.
-Regardless of the order chosen, this process of allocations is finite. 
-We can place an upper bound on the number of newly allocated fat vertices -- total number of vertices in the tree (including deleted vertices). 
+We can place an upper bound on the number of newly allocated fat nodes -- total number of vertices in the tree (including deleted vertices). 
 At most one new slot is occupied for every vertex in the tree.

-To take advantage of fat vertices, we need the balancing algorithm to limit the number of vertices that change in one operation at least in the average case. 
-Several data structures have these properties, AVL trees for example.
-Furthermore, we need a limit on the number of pointers that can target one vertex at one time. Otherwise complexity would suffer.
+To take advantage of fat nodes, we need the balancing algorithm to limit the number of vertices that change in one operation. 
+It is sufficient that the changes can be amortized to a constant number per update.
+Furthermore, we need a limit on the number of pointers that can target one vertex at one time.

 \theorem{
-Suppose any binary search tree balancing algorithm satisfying the following properties:
+Consider any binary search tree balancing algorithm satisfying the following properties:
 \list{o}
 	\:There is a constant $d$ such that for any $n$ successive operations on initially empty tree, the number of vertex changes made to the tree is at most $dn$. 
 	\:There is a constant bound on the number of pointers to any one vertex at any time.
 \endlist
-Then this algorithm with the addition of fat vertices for semi-persistence, consumes $\O(n)$ space for the entire history of $n$ operations starting from an empty tree.
+Then this algorithm with the addition of fat nodes for semi-persistence, consumes $\O(n)$ space for the entire history of $n$ operations starting from an empty tree.
 }

 \proof
-We denote the number of pointer fields per vertex as $p$ and maximum number of vertices pointing to one vertex at a time as $k$. 
-We then define the number of slots in fat node as $s = p + k + 1$.
+The number of pointer fields per vertex is denoted by $p$ and the maximum number of vertices pointing to one vertex at a time by $k$. 
+We then define the number of slots in one fat node as $s = k + 1$ and add $k$ inverse pointers to vertices.

-We define the potential of the structure as the total number of occupied slots in all fat vertices that are the last in their doubly-linked list. 
+We define the potential of the structure as the total number of occupied slots in all fat nodes that are the last in their doubly-linked list. 
 (Thus initially zero.) 
-Allocation of a new fat vertex will cost one unit of energy. 
-This cost can be paid from the potential or the operation. 
-We will show that the operation needs to be charged only a constant amount of energy per one vertex modification by the original algorithm (to compensate increase in potential or pay for allocations), from which the proposition follows.
-
-For insert, a new fat node is created with one occupied slot (which increases potential by a constant). 
-This increase is paid for by the operation. During rebalancing of the tree, $r$ vertices are to be modified. 
-Let us consider this sequence of modifications one by one. 
-The Operation sends one floating unit of energy to each of the $r$ vertices.
-
-If a modification of $v$ is second or later modification of $v$ during this operation, changes are simply written to the slot for this version. 
-Otherwise, number of used slots is checked. 
-If there is one or more empty available, new slot is used. 
-New slot takes values of unchanged fields and pointers from the preceding slot. 
-This increases potential by one, which is covered by the floating unit of energy.
-
-If no slots are available, new fat vertex $v'$ is allocated and one of its slots is used. 
-This step triggers a decrease in the potential by $p+k$. 
-The floating unit of energy is used to pay for allocation of the new vertex. 
-Next, fat vertices to corresponding current version interval of vertices having pointers to $v$ need to have this reflected. 
-Additionally inverse pointers to $v'$ need to be set. 
-These are at most $p+k$ changes to other vertices that may use their new empty slots. 
-The decrease in potential is used to send one unit of energy to every such vertex that need an update. 
-Changes are executed recursively and will not require extra energy to be charged on this modification.
-
-Operation is charged only constant amount of work for every change it does.
-Space complexity consumed is bounded by the number changes done by update operations. Thus space complexity is $\O(n)$. 
+Allocation of a new fat node will cost one unit of energy. 
+This cost can be paid from the potential or by the operation. 
+We will show that the operation needs to be charged only a constant amount of energy per one vertex modification by the original algorithm (to compensate for the increase in potential or to pay for allocations), from which the proposition follows.
+
+Allocation of a new fat node associated with the insertion of a new vertex into the tree is paid for by the insert operation.
+
+All other modifications changing a field or a proper pointer must pay one unit of energy. 
+Modification of inverse pointers does not need any extra space.
+For modifications triggered by the insert, delete, or the balancing algorithm directly the operation covers this cost.
+If a modification constitutes an insertion of a slot into an existing fat node, the unit of energy is deposited into the potential.
+Otherwise the unit is used to pay for allocation of a new fat node. This allocation causes the potential to decrease by $k$ units. 
+These can be used to give one unit of energy to each of up to $k$ subsequent modification triggered by checking inverse pointers.
+If a slot for the same version already exists, no additional space is needed and the unit of potential is in vain.
+
+All allocations of new fat nodes are thus paid for while the operation is charged only constant amount of work for every change it does.
+Consumed space complexity is bounded by the number of changes done by update operations. Thus space complexity is $\O(n)$. 
 \qed

-Regarding the time complexity, searching the correct slot in a fat vertex produces only constant overhead. 
-It is easy to realize that every operation done during invariant restoration can be charged on a memory allocation of a fat vertex. 
-Also there exists a positive constant $c$, depending only on the balancing algorithm, such that for every allocated fat vertex the number of operations charged on it is at most $c$.
+Regarding the time complexity, searching the correct slot in a fat node produces only constant overhead. 
+Realizing that every operation can be charged on a memory allocation of a fat node such that there is a constant $c$ depending only on the balancing algorithm such that for every allocated fat node the number of operations charged on it is at most $c$. 
+Thus provided the conditions from the previous proposition, the cost to write changes into fat nodes is amortized to $\O(1)$ per operation.
+
+Time complexity of operations with the tree depends on the original balancing algorithm used. With red-black trees for example, we get $\O(\log n)$ per operation from the original algorithm in addition to the amortized $\O(1)$. Here $n$ is the size of the tree in the version that is queried.

-Assuming the conditions from the previous proposition, the cost to write changes into fat vertices is amortized to $\O(1)$ per operation.

 \section{Point Localization in a plane}

@@ -238,7 +232,7 @@ Some common types of binary search trees like red-black trees or weak-AVL trees
 Moving from semi-persistence to full persistence we encounter an obstacle -- versions no longer form an implicit linear order. 
 (By versions we mean states of the tree in~between updates. 
 We will also use some auxiliary versions not directly mappable to any such state.) 
-Nonetheless, to work with fat vertices, we need to be able to determine slots that carry values correct for current version. 
+Nonetheless, to work with fat nodes, we need to be able to determine slots that carry values correct for current version. 
 To achieve this, we need to identify an interval of versions the current version would fall into. 
 For this purpose we will try to introduce an ordering to versions of the persistent data structure.

@@ -292,10 +286,10 @@ Rebuilding of some subtree will not change order of versions as all path encodin
 Integer arithmetic can be used to efficiently update encodings of paths to the root.

 Our use of list ordering will involve one more trick. 
-Suppose we have some versions $a$, $b$, $c$ in that order and a fat vertex with slots for $a$ and $c$. 
+Suppose we have some versions $a$, $b$, $c$ in that order and a fat node with slots for $a$ and $c$. 
 When we insert a successor $a'$ to version $a$ and a slot for that version, we by mistake modify also state of the vertex for version $b$. 
 Therefore, we also need to undo the changes by creating $a''$, a successor to $a'$, and inserting a slot for $a''$ directly after the slot for $a'$. 
-This would result in a fat vertex with slots for $a$, $a'$, $a''$, and $c$.
+This would result in a fat node with slots for $a$, $a'$, $a''$, and $c$.

 Before moving to the next section, we remark that Tsakalidis found a method to get $\O(1)$ amortized complexity for insert and delete with weight-balanced trees via indirection.

@@ -348,7 +342,14 @@ Similarly, if expansion of the array is needed, this can be amortized through pa

 \exercises

-\ex{Consider what properties of (semi-)persistent binary search trees change when capacity of fat vertices is increased above the value used in the proof.}
+\ex{Consider what properties of (semi-)persistent binary search trees change when capacity of fat nodes is increased above the value used in the proof.}
+
+\ex{%
+What if we defined fat nodes for semi-persistence differently? 
+Suppose slots only contained values of changed fields and pointers. 
+Thus each fat node would carry a set of default values for every field and pointer, these values are used if not overridden by a slot. 
+How would the complexity change?
+}

 \endexercises