diff --git a/streaming/streaming.tex b/streaming/streaming.tex index 0abcf72e2a582e17ee0eda6e1c1b488d41517056..9785c3c119697f3c2abbc75e1f823aea0bd1983d 100644 --- a/streaming/streaming.tex +++ b/streaming/streaming.tex @@ -106,7 +106,7 @@ can be easily combined. \:\em{Init}: $C[1\ldots t][1\ldots k] \= 0$, where $k \= \lceil 2 / \varepsilon \rceil$ and $t \= \lceil \log(1 / \delta) \rceil$. -\:: Choose $t$ independent hash functions $h_1, \ldots h_t : [n] \to [k]$, each +\:: Choose $t$ independent hash functions $h_1, \ldots , h_t : [n] \to [k]$, each from a 2-independent family. \:\em{Process}($x$): \::For $i \in [t]$: $C[i][h_i(x)] \= C[i][h_i(x)] + 1$. @@ -114,7 +114,7 @@ can be easily combined. \endalgo Note that the algorithm needs $\O(tk \log m)$ bits to store the table $C$, and -$\O(t \log n)$ bits to store the hash functions $h_1, \ldots h_t$, and hence +$\O(t \log n)$ bits to store the hash functions $h_1, \ldots , h_t$, and hence uses $\O(1/\varepsilon \cdot \log (1 / \delta) \cdot \log m + \log (1 / \delta)\cdot \log n)$ bits. It remains to show that it computes a good estimate. @@ -298,7 +298,7 @@ Recall that $\E[Y_r] = d / 2^r$, so the terms in the first sum can be bounded using Chebyshev's inequality. The second sum is equal to the probability of the event $[t \geq s]$, that is, the event $Y_{s - 1} \geq c / \varepsilon^2$ (since $z$ is only increased when $B$ becomes larger than this threshold). -We will simply use Markov's inequality to bound this event. +We will use Markov's inequality to bound the probability of this event. Putting it all together, we have: $$\eqalign{ @@ -327,7 +327,12 @@ The counter $z$ requires only $\O(\log \log n)$ bits, and $B$ has $\O(1 / \varepsilon^2)$ entries, each of which needs $\O( \log n )$ bits. Finally, the hash function $h$ needs $\O(\log n)$ bits, so the total space used is dominated by $B$, and the algorithm uses $\O(\log n / \varepsilon^2)$ -space. +space. As before, if we use the median trick, the space used increases to +$\O(\log\delta \cdot \log n / \varepsilon^2)$. + +(TODO: include the version of this algorithm where we save space by storing +$(g(a), {\tt tz}(h(a)))$ instead of $(a, {\tt tz}(h(a)))$ in $B$ for some +hash function $g$ as an exercise?) \endchapter