Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
ds2-notes
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
datovky
ds2-notes
Commits
b64a5a87
Commit
b64a5a87
authored
6 years ago
by
Martin Mareš
Browse files
Options
Downloads
Patches
Plain Diff
Intro: Examples, model of computation
parent
f2808c15
No related branches found
No related tags found
No related merge requests found
Changes
3
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
01-intro/Makefile
+0
-0
0 additions, 0 deletions
01-intro/Makefile
01-intro/intro.tex
+174
-0
174 additions, 0 deletions
01-intro/intro.tex
01-test/test.tex
+0
-687
0 additions, 687 deletions
01-test/test.tex
with
174 additions
and
687 deletions
01-
test
/Makefile
→
01-
intro
/Makefile
+
0
−
0
View file @
b64a5a87
File moved
This diff is collapsed.
Click to expand it.
01-intro/intro.tex
0 → 100644
+
174
−
0
View file @
b64a5a87
\ifx\chapter\undefined
\input
adsmac.tex
\fi
\def\optable
#1
{$$
\def\cr
{
\crcr\noalign
{
\smallskip
}}
\vbox
{
\halign
{
\hbox
to
9
em
{
##
\hfil
}&
\vtop
{
\hsize
=
0
.
65
\hsize\parindent
=
0
pt
\strut
##
\strut
}
\crcr
#
1
\noalign
{
\vskip
-
\smallskipamount
}
}}$$}
\chapter
[intro]
{
Introduction
}
\section
{
Examples of data structures
}
Generally, a~data structure is a~
\uv
{
black box
}
, which contains some data and
allows
\df
{
operations
}
on the data. Some operations are
\em
{
queries
}
on the current
state of data, some are
\em
{
updates
}
which modify the data. The data are encapsulated
within the structure, so that they can be accessed only through the operations.
A~
\em
{
static
}
data structure is once built and then it answers an unlimited number queries,
while the data stay constant. A~
\em
{
dynamic
}
structure allows updates.
We usually separate the
\df
{
interface
}
of the structure (i.e., the set of operations
supported and their semantics) from its
\df
{
implementation
}
(i.e., the layout of data
in memory and procedures handling the operations).
\subsection
{
Queues and stacks
}
A~
\df
{
queue
}
is a~sequence of items, which supports the following
operations:
\optable
{
$
\opdf
{
Enqueue
}
(
x
)
$
&
Append a~new item~
$
x
$
at the head.
\cr
$
\opdf
{
Dequeue
}$
&
Remove the item at the tail and return it.
\cr
$
\opdf
{
IsEmpty
}$
&
Test if the queue is currently empty.
\cr
}
Here the
\em
{
head
}
is the last item of the sequence and
\em
{
tail
}
is the first one.
There are two obvious ways how to implement a~queue:
\list
{
o
}
\:
A~
\em
{
linked list
}
-- for each item, we create a~record in the memory, which contains
the item's data and a~pointer to the next item. Additionally, we keep pointers to the
head and tail. Obviously, all three operations run in constant time.
\:
An~
\em
{
array
}
-- if we know an upper limit on the maximum number of items, we can store
them in a~cyclically indexed array and keep the index of the head and the tail.
Again, the time complexity of all operations is constant.
\endlist
A~similar data structure is the
\df
{
stack
}
--- a~sequence where both addition and removal
of items happen at the same end.
\subsection
{
Sets and dictionaries
}
Another typical interface of a~data structure is the
\df
{
set.
}
It contains a~finite subset
of some
\df
{
universe
}
~
${
\cal
U
}$
(e.g., the set of all integers). The typical operations on a~set are:
\optable
{
$
\opdf
{
Insert
}
(
x
)
$
&
Add an element
$
x
\in
{
\cal
U
}$
to the set. If it was already
present, nothing happens.
\cr
$
\opdf
{
Delete
}
(
x
)
$
&
Delete an element
$
x
\in
{
\cal
U
}$
from the set. If it was not
present, nothing happens.
\cr
$
\opdf
{
Find
}
(
x
)
$
&
Check if
$
x
\in
{
\cal
U
}$
is an~element of the set. Also called
\opdf
{
Member
}
or
\opdf
{
Lookup
}
.
\cr
$
\opdf
{
Build
}
(
x
_
1
,
\ldots
,x
_
n
)
$
&
Construct a~new set containing the elements given. In some cases,
this can be faster than inserting the elements one by one.
\cr
}
Here are the typical implementation of sets together with the corresponding
complexities of operations (
$
n
$
~denotes the cardinality of the set):
$$
\vbox
{
\halign
{
#
\hfil\qquad
&&
$#$
\hfil
~~
\cr
&
\alg
{
Insert
}
&
\alg
{
Delete
}
&
\alg
{
Find
}
&
\alg
{
Build
}
\cr
\noalign
{
\smallskip
}
Linked list
&
\O
(
n
)
&
\O
(
n
)
&
\O
(
n
)
&
\O
(
n
)
\cr
Array
&
\O
(
n
)
&
\O
(
n
)
&
\O
(
n
)
&
\O
(
n
)
\cr
Sorted array
&
\O
(
n
)
&
\O
(
n
)
&
\O
(
\log
n
)
&
\O
(
n
\log
n
)
\cr
Binary search tree
&
\O
(
\log
n
)
&
\O
(
\log
n
)
&
\O
(
\log
n
)
&
\O
(
n
\log
n
)
\cr
Hash table
&
\O
(
1
)
&
\O
(
1
)
&
\O
(
1
)
&
\O
(
n
)
\cr
}}$$
An~
\alg
{
Insert
}
to a~linked list or to an~array can be performed in constant
time if we can avoid checking whether the element is already present in the set.
While sorted arrays are quite slow as dynamic structures, they are efficient
statically: once built in
$
\O
(
\log
n
)
$
time per element, they answer queries
in
$
\O
(
\log
n
)
$
.
Hash tables achieve constant complexity of operations only on average.
This will be formulated precisely and proven in the chapter on hashing.
Also, while the other implementations can work with an arbitrary universe
as long as two elements can be compared in constant time, hash tables require
arithmetic operations.
An~useful extension of sets are
\dfr
{
dictionaries
}{
dictionary
}
. They store a~set of
distinct
\em
{
keys,
}
each associated with a~
\em
{
value
}
(possibly coming from a~different
universe). That is, they behave as generalized arrays indexed by arbitrary keys.
Most implementations of sets can be extended to dictionaries by keeping the value at
the place where the key is stored.
Sometimes, we also consider
\em
{
multisets,
}
in which an~element can be present multiple
times. A~multiset can be represented by a~dictionary where the value counts occurrences
of the key in the set.
\subsection
{
Ordered sets
}
Sets can be extended by order operations:
\optable
{
\opdf
{
Min
}
,
\opdf
{
Max
}
&
Return the minimum or maximum element of the set.
\cr
$
\opdf
{
Succ
}
(
x
)
$
&
Return the successor of
$
x
\in
{
\cal
U
}$
--- the smallest
element of the set which is greater than~
$
x
$
(if it exists).
The~
$
x
$
itself need not be present in the set.
\cr
$
\opdf
{
Pred
}
(
x
)
$
&
Similarly, the predecessor of~
$
x
$
is the largest element
smaller than~
$
x
$
.
\cr
}
Except for hash tables, all our implementations of sets can support order operations:
$$
\vbox
{
\halign
{
#
\hfil\qquad
&&
$#$
\hfil
~~
\cr
&
\alg
{
Min
}
/
\alg
{
Max
}
&
\alg
{
Pred
}
/
\alg
{
Succ
}
\cr
\noalign
{
\smallskip
}
Linked list
&
\O
(
n
)
&
\O
(
n
)
\cr
Array
&
\O
(
n
)
&
\O
(
n
)
\cr
Sorted array
&
\O
(
1
)
&
\O
(
\log
n
)
\cr
Binary search tree
&
\O
(
\log
n
)
&
\O
(
\log
n
)
\cr
}}$$
A~sequence can be sorted by
$
n
$
~calls to
\alg
{
Insert
}
, one
\alg
{
Min
}
, and
$
n
$
~calls to
\alg
{
Succ
}
.
If the elements can be only compared, the standard lower bound for comparison-based sorting
implies that at least one of
\alg
{
Insert
}
and
\alg
{
Succ
}
takes
$
\Omega
(
\log
n
)
$
time.
Similarly, we can define ordered dictionaries and multisets.
\exercises
\ex
{
%
Consider enumeration of all keys in a~binary search tree using
\alg
{
Min
}
and
$
n
$
~times
\alg
{
Succ
}
. Prove that although a~
\alg
{
Succ
}
requires
$
\Theta
(
\log
n
)
$
time in the worst case, the whole enumeration takes only
$
\Theta
(
n
)
$
time.
}
\endexercises
\section
{
Model of computation
}
As we need to discern minor differences in time complexity of operations,
we have to state our model of computation carefully. All our algorithms will
run on the
\df
{
Random Access Machine
}
(RAM).
The memory of the RAM consists of
\em
{
memory cells.
}
Each memory cell contains a~single
integer. The memory cell is identified by its
\em
{
address,
}
which is again an~integer.
For example, we can store data in individual
\em
{
variables
}
(memory cells with
a~fixed address), in
\em
{
arrays
}
(sequences of identially formatted items stored
in contiguous memory cells), or in
\em
{
records
}
(blocks of memory containing a~fixed
number of items, each with a~different, but fixed layout; this works as a~
{
\tt
struct
}
in~C).
Arrays and records are referred to by their starting address; we call these addresses
\em
{
pointers,
}
but formally speaking, they are still integers.
We usually assume that memory for arrays and records is obtained dynamically using
a~
\em
{
memory allocator,
}
which can be asked to reserve a~given amount of memory cells
and free it again when it is no longer needed. This means that sizes of all arrays must
be known when creating the array and it cannot be changed afterwards.
The machine performs the usual arithmetic and logical operators (essentially those available
in the C~language) on integers in constant time. We can also compare integers and perform
both conditional and unconditional jumps. The constant time per operation is reasonable as
long as the integers are not too large --- for concreteness, let us assume that all values
computed by our algorithms are polynomial in the size of the input and in the maximum absolute
value given in the input.
Memory will be measured in memory cells
\em
{
used
}
by the algorithm. A~cell is used if it
lies between the smallest and largest address accessed by the program. Please keep in mind
that when the program starts, all memory cells contain undefined values except for those
which hold the program's input.
\section
{
Amortized analysis
}
\endchapter
This diff is collapsed.
Click to expand it.
01-test/test.tex
deleted
100644 → 0
+
0
−
687
View file @
f2808c15
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment