Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
ds2-notes
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
datovky
ds2-notes
Commits
9aacbf01
Commit
9aacbf01
authored
6 years ago
by
Martin Mareš
Browse files
Options
Downloads
Patches
Plain Diff
Intro: Flexible arrays
parent
b64a5a87
No related branches found
No related tags found
No related merge requests found
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
01-intro/intro.tex
+85
-0
85 additions, 0 deletions
01-intro/intro.tex
tex/adsmac.tex
+3
-0
3 additions, 0 deletions
tex/adsmac.tex
with
88 additions
and
0 deletions
01-intro/intro.tex
+
85
−
0
View file @
9aacbf01
...
...
@@ -171,4 +171,89 @@ which hold the program's input.
\section
{
Amortized analysis
}
There is a~recurring pattern in the study of data structures: operations which take a~long
time in the worst case, but typically much less. In many cases, we can prove that the worst-case
time of a~sequence of
$
n
$
~such operations is much less than
$
n
$
~times the worst-case time
of a~single operation. This leads to the concept of
\em
{
amortized complexity,
}
but before
we define it rigorously, let us see several examples of such phenomena.
\subsection
{
Flexible arrays
}
It often happens than we want to store data in an~array (so that it can be accessed in arbitrary
order), but we cannot predict how much data will arrive. Most programming languages offer some kind
of flexible arrays (e.g.,
\|
std::vector| in
\Cpp
) which can be resized at will. We will show how
to implement a~flexible array efficiently.
Suppose that we allocate an array of some
\em
{
capacity
}
~
$
C
$
, which will contain some
$
n
$
~items.
The number of items will be called the
\em
{
size
}
of the array. Initially, the capacity will be some
constant and the size will be zero. Items start arriving one by one, we append them to the array
and the size will gradually increase. Once we hit the capacity, we need to
\em
{
reallocate
}
the array:
create a~new array of some higher capacity~
$
C'
$
, copy all items to it, and delete the old array.
An~ordinary append takes constant time, but a~reallocation requires
$
\Theta
(
C'
)
$
time. However,
if we choose the new capacity wisely, reallocations will be infrequent. Let us assume that the
initial capacity is~1 and we always double capacity on reallocations. Hence the capacity after
$
k
$
~reallocations will be exactly~
$
2
^
k
$
.
If we appended
$
n
$
~items, all reallocations together take time
$
\Theta
(
2
^
0
+
2
^
1
+
\ldots
+
2
^
k
)
$
for~
$
k
$
such that
$
2
^{
k
-
1
}
< n
\le
2
^
k
$
(after the
$
k
$
-th reallocation the array was large enough,
but it wasn't before). This implies that
$
n
\le
2
^
k <
2
n
$
.
Hence
$
2
^
0
+
\ldots
+
2
^
k
=
2
^{
k
+
1
}
-
1
\in
\Theta
(
n
)
$
.
We can conclude that while a~single append can take
$
\Theta
(
n
)
$
time, all
$
n
$
~appends also take
$
\Theta
(
n
)
$
time, as if each append took constant time only. We will say that the amortized complexity of a~single append
is constant.
This type of analysis is sometimes called the
\df
{
aggregation method
}
--- instead of considering
each operation separately, we aggregated them and found an upper bound on the total time.
\subsection
{
Shrinkable arrays
}
What if we also want to remove elements from the flexible array? For example, we might want to
use it to implement a~stack. When an element is removed, we need not change the capacity, but that
could lead to wasting lots of memory. (Imagine a~case in which we have
$
n
$
~items which we move
between
$
n
$
~stacks. This could consume
$
\Theta
(
n
^
2
)
$
cells of memory.)
We modify the flexible array, so that it will both
\em
{
stretch
}
(to
$
C'
=
2
C
$
) and
\em
{
shrink
}
(to
$
C'
=
\max
(
C
/
2
,
1
)
$
) as necessary. If the intial capacity is~1, all capacities will be power of
two, again. We will try to maintain an invariant that
$
C
\in\Theta
(
n
)
$
, so at
most a~constant fraction of memory will be wasted.
An~obvious strategy would be to stretch the array if
$
n>C
$
and shrink it when
$
n<C
/
2
$
. However,
that would have a~bad worst case: Suppose that we have
$
n
=
C
=
C
_
0
$
for some even~
$
C
_
0
$
. When we append
an~item, we cause the array to stretch, getting
$
n
=
C
_
0
+
1
$
,
$
C
=
2
C
_
0
$
. Now we remove two items, which
causes a~shrink after which we have
$
n
=
C
_
0
-
1
$
,
$
C
=
C
_
0
$
. Appending one more item returns the structure
back to the initial state. Therefore, we have a~sequence of 4~operations which makes the structure
stretch and shrink, spending time
$
\Theta
(
C
_
0
)
$
for an~arbitrarily high~
$
C
_
0
$
. All hopes for constant
amortized time per operation are therefore lost.
The problem is that stretching a~
\uv
{
full
}
array leads to an~
\em
{
almost empty
}
array; similarly,
shrinking an~
\uv
{
empty
}
array gives us an~
\uv
{
almost full
}
array. We need to design better rules
such that an array after a~stretch or shrink will be far from being empty or full.
We will stretch when
$
n>C
$
and shrink when
$
n<C
/
4
$
. Intuitively, this should work: both stretching
and shrinking leads to
$
n
=
C
/
2
\pm
1
$
. We are going to prove that this is a~good choice.
Let us consider an~arbitrary sequence of
$
m
$
~operations, each being either an~append or a~removal.
We split the sequence to
\em
{
blocks,
}
where a~block ends when the array is reallocated (or when
the whole sequence of operation ends). For each block, we analyze the cost of the relocation at its end:
\list
{
o
}
\:
The first block starts with capacity~1, so its reallocation takes constant time.
\:
The last block does not end with a~relocation.
\:
All other blocks start with a~relocation, so at their beginning we have
$
n
=
C
/
2
$
. If it ends with
a~stretch,
$
n
$
~must have increased to~
$
C
$
during the block. If it ends with a~shrink,
$
n
$
~must have
dropped to~
$
C
/
4
$
. In both cases, the block must contain at least
$
C
/
4
$
operations. Hence we can
redistribute the
$
\Theta
(
C
)
$
cost of the reallocation to
$
\Theta
(
C
)
$
operations, each getting
$
\Theta
(
1
)
$
time.
\endlist
We have proved that total time of all operations can be redistributed in a~way such that each operation
gets only
$
\Theta
(
1
)
$
units of time. Hence a~sequence of~
$
m
$
operations takes
$
\Theta
(
m
)
$
time, assuming
that we started with an~empty array.
This is a~common technique, which is usually called the
\em
{
accounting method.
}
It redistributes time
between operations so that the total time remains the same, but the worst-case time decreases.
\endchapter
This diff is collapsed.
Click to expand it.
tex/adsmac.tex
+
3
−
0
View file @
9aacbf01
...
...
@@ -270,6 +270,9 @@
\let\plaintilde
=~
\protected\def
~
{
\plaintilde
}
% C++
\def\Cpp
{
C
{
\tt
++
}}
%%% Fonty %%%
\def\chapfont
{
\setfonts
[LMRoman/10]
\bf
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment