Skip to content
GitLab
Menu
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
datovky
ds2notes
Commits
ad01c805
Commit
ad01c805
authored
Apr 18, 2021
by
Parth Mittal
Browse files
wrote misra/gries algorithm and analysis
parent
cdaacc99
Changes
1
Hide whitespace changes
Inline
Sidebyside
streaming/streaming.tex
View file @
ad01c805
...
...
@@ 38,4 +38,60 @@ Algorithm does.
\subsection
{
Misra/Gries Algorithm
}
TODO: Typeset the algorithm better.
\proc
{
FrequencyEstimate
}$
(
\alpha
, k
)
$
\algin
the data stream
$
\alpha
$
, the target for the estimator
$
k
$
\:
Init:
$
A
\=
\emptyset
$
\:
For
$
j
$
a number from the stream:
\:
If
$
j
$
is a key in
$
A
$
,
$
A
[
j
]
\=
A
[
j
]
+
1
$
.
\:
Else If
$
\vert
A
\vert
< k

1
$
, add the key
$
j
$
to
$
A
$
and set
$
A
[
j
]
\=
1
$
.
\:
Else For each key
$
\ell
$
in
$
A
$
, reduce
$
A
[
\ell
]
\=
A
[
\ell
]

1
$
.
Delete
$
\ell
$
from
$
A
$
if
$
A
[
\ell
]
=
0
$
.
\:
After processing the entire stream, return A.
\endalgo
Let us show that
$
A
[
j
]
$
is a good estimate for the frequency
$
f
_
j
$
.
\lemma
{
$
f
_
j

m
/
k
\leq
A
[
j
]
\leq
f
_
j
$
}
\proof
Suppose that
$
A
$
maintains the value for each key
$
j
\in
[
n
]
$
(instead of
just
$
k

1
$
of them). We can recast
\alg
{
FrequencyEstimate
}
in this setting:
We always increment
$
A
[
j
]
$
on seeing
$
j
$
in the stream, but if there are
$
\geq
k
$
positive values
$
A
[
\ell
]
$
after this step, we decrease each of them
by 1.
In particular, this reduces the value of the most recently added key
$
A
[
j
]
$
back to
$
0
$
.
Now, we see immediately that
$
A
[
j
]
\leq
f
_
j
$
, since it is only incremented when
we see
$
j
$
in the stream. To see the other inequality, consider the potential
function
$
\Phi
=
\sum
_{
\ell
}
A
[
\ell
]
$
. Note that
$
\Phi
$
increases by exactly
$
m
$
(since the stream contains
$
m
$
elements), and is decreased by
$
k
$
every
time
$
A
[
j
]
$
decreases. Since
$
\Phi
=
0
$
initially and
$
\Phi
\geq
0
$
, we get
that
$
A
[
j
]
$
is decreased at most
$
m
/
k
$
times.
\qed
Now, for
$
j
\in
F
_
k
$
, we know that
$
f
_
j > m
/
k
$
, which implies that
$
A
[
j
]
>
0
$
.
Hence
$
F
_
k
\subseteq
C
=
\{
j
\mid
A
[
j
]
>
0
\}
$
, and we have a
$
C
$
of size
$
k

1
$
ready for the second pass over the input.
\theorem
{
There exists a deterministic 2pass algorithm that finds
$
F
_
k
$
in
$
\O
(
k
(
\log
n
+
\log
m
))
$
space.
}
\proof
The correctness of the algorithm follows from the discussion above, we show
the bound on the space used below.
In the first pass, we only need to store
$
k

1
$
keyvalue pairs for
$
A
$
(for example, as an unorderedlist),
and the key and the value need
$
\lfloor\log
_
2
n
\rfloor
+
1
$
and
$
\lfloor
\log
_
2
m
\rfloor
+
1
$
bits respectively.
In the second pass, we have one keyvalue pair for each element of
$
C
$
, and
they take the same amount of space as above.
\qed
\endchapter
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment