Intro: Examples, model of computation

b64a5a87 · Martin Mareš · f2808c15 · b64a5a87 · b64a5a87 · f2808c15
Commit b64a5a87 authored Feb 22, 2019 by Martin Mareš
--- a/01-test/Makefile
+++ b/01-test/Makefile
--- a/01-intro/intro.tex
+++ b/01-intro/intro.tex
+\ifx\chapter\undefined
+\input adsmac.tex
+\fi
+
+\def\optable#1{$$
+\def\cr{\crcr\noalign{\smallskip}}
+\vbox{\halign{
+\hbox to 9em{##\hfil}&\vtop{\hsize=0.65\hsize\parindent=0pt\strut ##\strut}\crcr
+#1
+\noalign{\vskip-\smallskipamount}
+}}$$}
+
+\chapter[intro]{Introduction}
+
+\section{Examples of data structures}
+
+Generally, a~data structure is a~\uv{black box}, which contains some data and
+allows \df{operations} on the data. Some operations are \em{queries} on the current
+state of data, some are \em{updates} which modify the data. The data are encapsulated
+within the structure, so that they can be accessed only through the operations.
+
+A~\em{static} data structure is once built and then it answers an unlimited number queries,
+while the data stay constant. A~\em{dynamic} structure allows updates.
+
+We usually separate the \df{interface} of the structure (i.e., the set of operations
+supported and their semantics) from its \df{implementation} (i.e., the layout of data
+in memory and procedures handling the operations).
+
+\subsection{Queues and stacks}
+
+A~\df{queue} is a~sequence of items, which supports the following
+operations:
+\optable{
+$\opdf{Enqueue}(x)$		& Append a~new item~$x$ at the head. \cr
+$\opdf{Dequeue}$		& Remove the item at the tail and return it. \cr
+$\opdf{IsEmpty}$		& Test if the queue is currently empty. \cr
+}
+Here the \em{head} is the last item of the sequence and \em{tail} is the first one.
+
+There are two obvious ways how to implement a~queue:
+
+\list{o}
+\:A~\em{linked list} -- for each item, we create a~record in the memory, which contains
+  the item's data and a~pointer to the next item. Additionally, we keep pointers to the
+  head and tail. Obviously, all three operations run in constant time.
+\:An~\em{array} -- if we know an upper limit on the maximum number of items, we can store
+  them in a~cyclically indexed array and keep the index of the head and the tail.
+  Again, the time complexity of all operations is constant.
+\endlist
+
+A~similar data structure is the \df{stack} --- a~sequence where both addition and removal
+of items happen at the same end.
+
+\subsection{Sets and dictionaries}
+
+Another typical interface of a~data structure is the \df{set.} It contains a~finite subset
+of some \df{universe}~${\cal U}$ (e.g., the set of all integers). The typical operations on a~set are:
+\optable{
+$\opdf{Insert}(x)$		& Add an element $x\in{\cal U}$ to the set. If it was already
+				  present, nothing happens. \cr
+$\opdf{Delete}(x)$		& Delete an element $x\in{\cal U}$ from the set. If it was not
+				  present, nothing happens. \cr
+$\opdf{Find}(x)$		& Check if $x\in{\cal U}$ is an~element of the set. Also called
+				  \opdf{Member} or \opdf{Lookup}. \cr
+$\opdf{Build}(x_1,\ldots,x_n)$	& Construct a~new set containing the elements given. In some cases,
+				  this can be faster than inserting the elements one by one. \cr
+}
+Here are the typical implementation of sets together with the corresponding
+complexities of operations ($n$~denotes the cardinality of the set):
+$$\vbox{\halign{
+#\hfil\qquad&&$#$\hfil~~\cr
+			& \alg{Insert}	& \alg{Delete}	& \alg{Find}	& \alg{Build}	\cr
+\noalign{\smallskip}
+Linked list		& \O(n)		& \O(n)		& \O(n)		& \O(n)		\cr
+Array			& \O(n)		& \O(n)		& \O(n)		& \O(n)		\cr
+Sorted array		& \O(n)		& \O(n)		& \O(\log n)	& \O(n\log n)	\cr
+Binary search tree	& \O(\log n)	& \O(\log n)	& \O(\log n)	& \O(n\log n)	\cr
+Hash table		& \O(1)		& \O(1)		& \O(1)		& \O(n)		\cr
+}}$$
+An~\alg{Insert} to a~linked list or to an~array can be performed in constant
+time if we can avoid checking whether the element is already present in the set.
+
+While sorted arrays are quite slow as dynamic structures, they are efficient
+statically: once built in $\O(\log n)$ time per element, they answer queries
+in $\O(\log n)$.
+
+Hash tables achieve constant complexity of operations only on average.
+This will be formulated precisely and proven in the chapter on hashing.
+Also, while the other implementations can work with an arbitrary universe
+as long as two elements can be compared in constant time, hash tables require
+arithmetic operations.
+
+An~useful extension of sets are \dfr{dictionaries}{dictionary}. They store a~set of
+distinct \em{keys,} each associated with a~\em{value} (possibly coming from a~different
+universe). That is, they behave as generalized arrays indexed by arbitrary keys.
+Most implementations of sets can be extended to dictionaries by keeping the value at
+the place where the key is stored.
+
+Sometimes, we also consider \em{multisets,} in which an~element can be present multiple
+times. A~multiset can be represented by a~dictionary where the value counts occurrences
+of the key in the set.
+
+\subsection{Ordered sets}
+
+Sets can be extended by order operations:
+\optable{
+\opdf{Min}, \opdf{Max}		& Return the minimum or maximum element of the set. \cr
+$\opdf{Succ}(x)$		& Return the successor of $x\in{\cal U}$ --- the smallest
+				  element of the set which is greater than~$x$ (if it exists).
+				  The~$x$ itself need not be present in the set. \cr
+$\opdf{Pred}(x)$		& Similarly, the predecessor of~$x$ is the largest element
+				  smaller than~$x$. \cr
+}
+Except for hash tables, all our implementations of sets can support order operations:
+$$\vbox{\halign{
+#\hfil\qquad&&$#$\hfil~~\cr
+			& \alg{Min}/\alg{Max}	& \alg{Pred}/\alg{Succ}	\cr
+\noalign{\smallskip}
+Linked list		& \O(n)			& \O(n)			\cr
+Array			& \O(n)			& \O(n)			\cr
+Sorted array		& \O(1)			& \O(\log n)		\cr
+Binary search tree	& \O(\log n)		& \O(\log n)		\cr
+}}$$
+A~sequence can be sorted by $n$~calls to \alg{Insert}, one \alg{Min}, and $n$~calls to \alg{Succ}.
+If the elements can be only compared, the standard lower bound for comparison-based sorting
+implies that at least one of \alg{Insert} and \alg{Succ} takes $\Omega(\log n)$ time.
+
+Similarly, we can define ordered dictionaries and multisets.
+
+\exercises
+
+\ex{%
+Consider enumeration of all keys in a~binary search tree using \alg{Min} and
+$n$~times \alg{Succ}. Prove that although a~\alg{Succ} requires $\Theta(\log n)$
+time in the worst case, the whole enumeration takes only $\Theta(n)$ time.
+}
+
+\endexercises
+
+\section{Model of computation}
+
+As we need to discern minor differences in time complexity of operations,
+we have to state our model of computation carefully. All our algorithms will
+run on the \df{Random Access Machine} (RAM).
+
+The memory of the RAM consists of \em{memory cells.} Each memory cell contains a~single
+integer. The memory cell is identified by its \em{address,} which is again an~integer.
+For example, we can store data in individual \em{variables} (memory cells with
+a~fixed address), in \em{arrays} (sequences of identially formatted items stored
+in contiguous memory cells), or in \em{records} (blocks of memory containing a~fixed
+number of items, each with a~different, but fixed layout; this works as a~{\tt struct} in~C).
+Arrays and records are referred to by their starting address; we call these addresses \em{pointers,}
+but formally speaking, they are still integers.
+
+We usually assume that memory for arrays and records is obtained dynamically using
+a~\em{memory allocator,} which can be asked to reserve a~given amount of memory cells
+and free it again when it is no longer needed. This means that sizes of all arrays must
+be known when creating the array and it cannot be changed afterwards.
+
+The machine performs the usual arithmetic and logical operators (essentially those available
+in the C~language) on integers in constant time. We can also compare integers and perform
+both conditional and unconditional jumps. The constant time per operation is reasonable as
+long as the integers are not too large --- for concreteness, let us assume that all values
+computed by our algorithms are polynomial in the size of the input and in the maximum absolute
+value given in the input.
+
+Memory will be measured in memory cells \em{used} by the algorithm. A~cell is used if it
+lies between the smallest and largest address accessed by the program. Please keep in mind
+that when the program starts, all memory cells contain undefined values except for those
+which hold the program's input.
+
+\section{Amortized analysis}
+
+\endchapter
--- a/01-test/test.tex
+++ b/01-test/test.tex