Skip to content
Snippets Groups Projects
Commit 9aacbf01 authored by Martin Mareš's avatar Martin Mareš
Browse files

Intro: Flexible arrays

parent b64a5a87
No related branches found
No related tags found
No related merge requests found
......@@ -171,4 +171,89 @@ which hold the program's input.
\section{Amortized analysis}
There is a~recurring pattern in the study of data structures: operations which take a~long
time in the worst case, but typically much less. In many cases, we can prove that the worst-case
time of a~sequence of $n$~such operations is much less than $n$~times the worst-case time
of a~single operation. This leads to the concept of \em{amortized complexity,} but before
we define it rigorously, let us see several examples of such phenomena.
\subsection{Flexible arrays}
It often happens than we want to store data in an~array (so that it can be accessed in arbitrary
order), but we cannot predict how much data will arrive. Most programming languages offer some kind
of flexible arrays (e.g., \|std::vector| in \Cpp) which can be resized at will. We will show how
to implement a~flexible array efficiently.
Suppose that we allocate an array of some \em{capacity}~$C$, which will contain some $n$~items.
The number of items will be called the \em{size} of the array. Initially, the capacity will be some
constant and the size will be zero. Items start arriving one by one, we append them to the array
and the size will gradually increase. Once we hit the capacity, we need to \em{reallocate} the array:
create a~new array of some higher capacity~$C'$, copy all items to it, and delete the old array.
An~ordinary append takes constant time, but a~reallocation requires $\Theta(C')$ time. However,
if we choose the new capacity wisely, reallocations will be infrequent. Let us assume that the
initial capacity is~1 and we always double capacity on reallocations. Hence the capacity after
$k$~reallocations will be exactly~$2^k$.
If we appended $n$~items, all reallocations together take time
$\Theta(2^0 + 2^1 + \ldots + 2^k)$
for~$k$ such that $2^{k-1} < n \le 2^k$ (after the $k$-th reallocation the array was large enough,
but it wasn't before). This implies that $n \le 2^k < 2n$.
Hence $2^0 + \ldots + 2^k = 2^{k+1}-1 \in \Theta(n)$.
We can conclude that while a~single append can take $\Theta(n)$ time, all $n$~appends also take $\Theta(n)$
time, as if each append took constant time only. We will say that the amortized complexity of a~single append
is constant.
This type of analysis is sometimes called the \df{aggregation method} --- instead of considering
each operation separately, we aggregated them and found an upper bound on the total time.
\subsection{Shrinkable arrays}
What if we also want to remove elements from the flexible array? For example, we might want to
use it to implement a~stack. When an element is removed, we need not change the capacity, but that
could lead to wasting lots of memory. (Imagine a~case in which we have $n$~items which we move
between $n$~stacks. This could consume $\Theta(n^2)$ cells of memory.)
We modify the flexible array, so that it will both \em{stretch} (to $C'=2C$) and \em{shrink}
(to $C'=\max(C/2,1)$) as necessary. If the intial capacity is~1, all capacities will be power of
two, again. We will try to maintain an invariant that $C\in\Theta(n)$, so at
most a~constant fraction of memory will be wasted.
An~obvious strategy would be to stretch the array if $n>C$ and shrink it when $n<C/2$. However,
that would have a~bad worst case: Suppose that we have $n=C=C_0$ for some even~$C_0$. When we append
an~item, we cause the array to stretch, getting $n=C_0+1$, $C=2C_0$. Now we remove two items, which
causes a~shrink after which we have $n=C_0-1$, $C=C_0$. Appending one more item returns the structure
back to the initial state. Therefore, we have a~sequence of 4~operations which makes the structure
stretch and shrink, spending time $\Theta(C_0)$ for an~arbitrarily high~$C_0$. All hopes for constant
amortized time per operation are therefore lost.
The problem is that stretching a~\uv{full} array leads to an~\em{almost empty} array; similarly,
shrinking an~\uv{empty} array gives us an~\uv{almost full} array. We need to design better rules
such that an array after a~stretch or shrink will be far from being empty or full.
We will stretch when $n>C$ and shrink when $n<C/4$. Intuitively, this should work: both stretching
and shrinking leads to $n = C/2 \pm 1$. We are going to prove that this is a~good choice.
Let us consider an~arbitrary sequence of $m$~operations, each being either an~append or a~removal.
We split the sequence to \em{blocks,} where a~block ends when the array is reallocated (or when
the whole sequence of operation ends). For each block, we analyze the cost of the relocation at its end:
\list{o}
\:The first block starts with capacity~1, so its reallocation takes constant time.
\:The last block does not end with a~relocation.
\:All other blocks start with a~relocation, so at their beginning we have $n=C/2$. If it ends with
a~stretch, $n$~must have increased to~$C$ during the block. If it ends with a~shrink, $n$~must have
dropped to~$C/4$. In both cases, the block must contain at least $C/4$ operations. Hence we can
redistribute the $\Theta(C)$ cost of the reallocation to $\Theta(C)$ operations, each getting $\Theta(1)$
time.
\endlist
We have proved that total time of all operations can be redistributed in a~way such that each operation
gets only $\Theta(1)$ units of time. Hence a~sequence of~$m$ operations takes $\Theta(m)$ time, assuming
that we started with an~empty array.
This is a~common technique, which is usually called the \em{accounting method.} It redistributes time
between operations so that the total time remains the same, but the worst-case time decreases.
\endchapter
......@@ -270,6 +270,9 @@
\let\plaintilde=~
\protected\def~{\plaintilde}
% C++
\def\Cpp{C{\tt ++}}
%%% Fonty %%%
\def\chapfont{\setfonts[LMRoman/10]\bf}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment