From cdaacc99c1191a2046ff2bd07ae28ec389f68e16 Mon Sep 17 00:00:00 2001 From: Parth Mittal <parth15069@iiitd.ac.in> Date: Sun, 18 Apr 2021 14:47:37 +0530 Subject: [PATCH] wrote intro to streaming / frequent elements --- streaming/Makefile | 3 +++ streaming/streaming.tex | 41 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) create mode 100644 streaming/Makefile create mode 100644 streaming/streaming.tex diff --git a/streaming/Makefile b/streaming/Makefile new file mode 100644 index 0000000..ba6c63e --- /dev/null +++ b/streaming/Makefile @@ -0,0 +1,3 @@ +TOP=.. + +include ../Makerules diff --git a/streaming/streaming.tex b/streaming/streaming.tex new file mode 100644 index 0000000..cf635cb --- /dev/null +++ b/streaming/streaming.tex @@ -0,0 +1,41 @@ +\ifx\chapter\undefined +\input adsmac.tex +\singlechapter{20} +\fi + +\chapter[streaming]{Streaming Algorithms} + +For this chapter, we will consider the streaming model. In this +setting, the input is presented as a ``stream'' which we can read +\em{in order}. In particular, at each step, we can do some processing, +and then move forward one unit in the stream to read the next piece of data. +We can choose to read the input again after completing a ``pass'' over it. + +There are two measures for the performance of algorithms in this setting. +The first is the number of passes we make over the input, and the second is +the amount of memory that we consume. Some interesting special cases are: +\tightlist{o} +\: 1 pass, and $O(1)$ memory: This is equivalent to computing with a DFA, and +hence we can recognise only regular languages. +\: 1 pass, and unbounded memory: We can store the entire stream, and hence this +is just the traditional computing model. +\endlist + +\section{Frequent Elements} + +For this problem, the input is a stream $\alpha[1 \ldots m]$ where each +$\alpha[i] \in [n]$. +We define for each $j \in [n]$ the \em{frequency} $f_j$ which counts +the occurences of $j$ in $\alpha[1 \ldots m]$. Then the majority problem +is to find (if it exists) a $j$ such that $f_j > m / 2$. + +We consider the more general frequent elements problem, where we want to find +$F_k = \{ j \mid f_j > m / k \}$. Suppose that we (magically) knew some small set +$C$ which contains $F_k$. Then we can pass over the input once, keeping track of +how many times we see each member of $C$, and then find $F_k$ easily. +The challenge is to find a small $C$, which is precisely what the Misra/Gries +Algorithm does. + +\subsection{Misra/Gries Algorithm} + +\endchapter -- GitLab