Skip to content
Snippets Groups Projects
Commit b64a5a87 authored by Martin Mareš's avatar Martin Mareš
Browse files

Intro: Examples, model of computation

parent f2808c15
No related branches found
No related tags found
No related merge requests found
File moved
\ifx\chapter\undefined
\input adsmac.tex
\fi
\def\optable#1{$$
\def\cr{\crcr\noalign{\smallskip}}
\vbox{\halign{
\hbox to 9em{##\hfil}&\vtop{\hsize=0.65\hsize\parindent=0pt\strut ##\strut}\crcr
#1
\noalign{\vskip-\smallskipamount}
}}$$}
\chapter[intro]{Introduction}
\section{Examples of data structures}
Generally, a~data structure is a~\uv{black box}, which contains some data and
allows \df{operations} on the data. Some operations are \em{queries} on the current
state of data, some are \em{updates} which modify the data. The data are encapsulated
within the structure, so that they can be accessed only through the operations.
A~\em{static} data structure is once built and then it answers an unlimited number queries,
while the data stay constant. A~\em{dynamic} structure allows updates.
We usually separate the \df{interface} of the structure (i.e., the set of operations
supported and their semantics) from its \df{implementation} (i.e., the layout of data
in memory and procedures handling the operations).
\subsection{Queues and stacks}
A~\df{queue} is a~sequence of items, which supports the following
operations:
\optable{
$\opdf{Enqueue}(x)$ & Append a~new item~$x$ at the head. \cr
$\opdf{Dequeue}$ & Remove the item at the tail and return it. \cr
$\opdf{IsEmpty}$ & Test if the queue is currently empty. \cr
}
Here the \em{head} is the last item of the sequence and \em{tail} is the first one.
There are two obvious ways how to implement a~queue:
\list{o}
\:A~\em{linked list} -- for each item, we create a~record in the memory, which contains
the item's data and a~pointer to the next item. Additionally, we keep pointers to the
head and tail. Obviously, all three operations run in constant time.
\:An~\em{array} -- if we know an upper limit on the maximum number of items, we can store
them in a~cyclically indexed array and keep the index of the head and the tail.
Again, the time complexity of all operations is constant.
\endlist
A~similar data structure is the \df{stack} --- a~sequence where both addition and removal
of items happen at the same end.
\subsection{Sets and dictionaries}
Another typical interface of a~data structure is the \df{set.} It contains a~finite subset
of some \df{universe}~${\cal U}$ (e.g., the set of all integers). The typical operations on a~set are:
\optable{
$\opdf{Insert}(x)$ & Add an element $x\in{\cal U}$ to the set. If it was already
present, nothing happens. \cr
$\opdf{Delete}(x)$ & Delete an element $x\in{\cal U}$ from the set. If it was not
present, nothing happens. \cr
$\opdf{Find}(x)$ & Check if $x\in{\cal U}$ is an~element of the set. Also called
\opdf{Member} or \opdf{Lookup}. \cr
$\opdf{Build}(x_1,\ldots,x_n)$ & Construct a~new set containing the elements given. In some cases,
this can be faster than inserting the elements one by one. \cr
}
Here are the typical implementation of sets together with the corresponding
complexities of operations ($n$~denotes the cardinality of the set):
$$\vbox{\halign{
#\hfil\qquad&&$#$\hfil~~\cr
& \alg{Insert} & \alg{Delete} & \alg{Find} & \alg{Build} \cr
\noalign{\smallskip}
Linked list & \O(n) & \O(n) & \O(n) & \O(n) \cr
Array & \O(n) & \O(n) & \O(n) & \O(n) \cr
Sorted array & \O(n) & \O(n) & \O(\log n) & \O(n\log n) \cr
Binary search tree & \O(\log n) & \O(\log n) & \O(\log n) & \O(n\log n) \cr
Hash table & \O(1) & \O(1) & \O(1) & \O(n) \cr
}}$$
An~\alg{Insert} to a~linked list or to an~array can be performed in constant
time if we can avoid checking whether the element is already present in the set.
While sorted arrays are quite slow as dynamic structures, they are efficient
statically: once built in $\O(\log n)$ time per element, they answer queries
in $\O(\log n)$.
Hash tables achieve constant complexity of operations only on average.
This will be formulated precisely and proven in the chapter on hashing.
Also, while the other implementations can work with an arbitrary universe
as long as two elements can be compared in constant time, hash tables require
arithmetic operations.
An~useful extension of sets are \dfr{dictionaries}{dictionary}. They store a~set of
distinct \em{keys,} each associated with a~\em{value} (possibly coming from a~different
universe). That is, they behave as generalized arrays indexed by arbitrary keys.
Most implementations of sets can be extended to dictionaries by keeping the value at
the place where the key is stored.
Sometimes, we also consider \em{multisets,} in which an~element can be present multiple
times. A~multiset can be represented by a~dictionary where the value counts occurrences
of the key in the set.
\subsection{Ordered sets}
Sets can be extended by order operations:
\optable{
\opdf{Min}, \opdf{Max} & Return the minimum or maximum element of the set. \cr
$\opdf{Succ}(x)$ & Return the successor of $x\in{\cal U}$ --- the smallest
element of the set which is greater than~$x$ (if it exists).
The~$x$ itself need not be present in the set. \cr
$\opdf{Pred}(x)$ & Similarly, the predecessor of~$x$ is the largest element
smaller than~$x$. \cr
}
Except for hash tables, all our implementations of sets can support order operations:
$$\vbox{\halign{
#\hfil\qquad&&$#$\hfil~~\cr
& \alg{Min}/\alg{Max} & \alg{Pred}/\alg{Succ} \cr
\noalign{\smallskip}
Linked list & \O(n) & \O(n) \cr
Array & \O(n) & \O(n) \cr
Sorted array & \O(1) & \O(\log n) \cr
Binary search tree & \O(\log n) & \O(\log n) \cr
}}$$
A~sequence can be sorted by $n$~calls to \alg{Insert}, one \alg{Min}, and $n$~calls to \alg{Succ}.
If the elements can be only compared, the standard lower bound for comparison-based sorting
implies that at least one of \alg{Insert} and \alg{Succ} takes $\Omega(\log n)$ time.
Similarly, we can define ordered dictionaries and multisets.
\exercises
\ex{%
Consider enumeration of all keys in a~binary search tree using \alg{Min} and
$n$~times \alg{Succ}. Prove that although a~\alg{Succ} requires $\Theta(\log n)$
time in the worst case, the whole enumeration takes only $\Theta(n)$ time.
}
\endexercises
\section{Model of computation}
As we need to discern minor differences in time complexity of operations,
we have to state our model of computation carefully. All our algorithms will
run on the \df{Random Access Machine} (RAM).
The memory of the RAM consists of \em{memory cells.} Each memory cell contains a~single
integer. The memory cell is identified by its \em{address,} which is again an~integer.
For example, we can store data in individual \em{variables} (memory cells with
a~fixed address), in \em{arrays} (sequences of identially formatted items stored
in contiguous memory cells), or in \em{records} (blocks of memory containing a~fixed
number of items, each with a~different, but fixed layout; this works as a~{\tt struct} in~C).
Arrays and records are referred to by their starting address; we call these addresses \em{pointers,}
but formally speaking, they are still integers.
We usually assume that memory for arrays and records is obtained dynamically using
a~\em{memory allocator,} which can be asked to reserve a~given amount of memory cells
and free it again when it is no longer needed. This means that sizes of all arrays must
be known when creating the array and it cannot be changed afterwards.
The machine performs the usual arithmetic and logical operators (essentially those available
in the C~language) on integers in constant time. We can also compare integers and perform
both conditional and unconditional jumps. The constant time per operation is reasonable as
long as the integers are not too large --- for concreteness, let us assume that all values
computed by our algorithms are polynomial in the size of the input and in the maximum absolute
value given in the input.
Memory will be measured in memory cells \em{used} by the algorithm. A~cell is used if it
lies between the smallest and largest address accessed by the program. Please keep in mind
that when the program starts, all memory cells contain undefined values except for those
which hold the program's input.
\section{Amortized analysis}
\endchapter
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment