Commit c6d1db62 by Filip Stedronsky

### Succinct: prefix-free encoding intro

parent 6b3f2cd4
 ... ... @@ -22,14 +22,18 @@ Let us denote $s(n)$ the number of bits needed to store a size-$n$ data structur The information-theoretical optimum is $OPT(n) := \lceil\log |X(n)|\rceil$ (which is essentially the entropy of a uniform distribution over $X(n)$). \defn{{\I Redundance} of a space-efficient data structure is $r(n) := s(n) - OPT(n)$.} Note: We will always ignore constant additive factors, so sometimes we will use the definition $OPT(n) := \log |X(n)|$ (without rounding, differs by at most one from the original definition) interchangably. \defn{{\I Redundancy} of a space-efficient data structure is $r(n) := s(n) - OPT(n)$.} Now we can define three classes of data structures based on their fine-grained space efficiency: \defn{A data structure is \tightlist{o} \:{\I implicit} when $s(n) \le OPT(n) + \O(1)$,\tabto{7.6cm}i.e., $r(n) = O(1)$, \:{\I implicit} when $s(n) \le OPT(n) + \O(1)$,\tabto{7.6cm}i.e., $r(n) = \O(1)$, \:{\I succinct} when $s(n) \le OPT(n) + {\rm o}(OPT(n))$,\tabto{7.6cm}i.e., $r(n) = {\rm o}(OPT(n))$, \:{\I compact} when $s(n) \le \O(OPT(n))$. \endlist ... ... @@ -48,7 +52,7 @@ data structure. And of course, as with any data structure, we want to be able to perform reasonably fast operations on these space-efficient data structures. \section{Succinct representation of strings} \section{Representation of strings over arbitrary alphabet} Let us consider the problem of representing a length-$n$ string over alphabet $[m]$, for example a string of base-10 digits. The following two naive approaches immediately ... ... @@ -75,11 +79,43 @@ convert that number to binary). With groups of size $k$, we get $$s(n) = \lceil n/k \rceil \lceil k \log 10 \rceil \le (n/k + 1)(k \log 10 + 1) = \underbrace{n \log 10}_{OPT(n)} + n/k + \underbrace{k\log 10 + 1}_{\O(1)}.$$ Thus we see that with increasing $k$, redundance goes down, approaching the optimum but never quite reaching it. For a redundancy goes down, approaching the optimum but never quite reaching it. For a fixed $k$ it is still linear and thus our scheme is not succinct. Also, with increasing $k$, local access time goes up. In practice, however, one could chose a good-compromise value for $k$ and happily use such a scheme. We will develop a succinct encoding scheme later in this chapter. \section{Intermezzo: Prefix-free encoding of bit strings} Let us forget about arbitrary alphabets for a moment and consider a different problem. We want to encode a binary string of arbitrary length in a way that allows the decoder to determine when the string ends (it can be followed by arbitrary other data). Furthermore, we want this to be a streaming encoding -- i.e., encode the string piece by piece while it is being read from the input. The length of the string is not known in advance -- it will only be determined when the input reaches its end\foot{If the length were known in advance, we could simply store the length using any simple variable-size number encoding, followed by the string data itself. This would give us $\O(\log n)$ redundancy almost for free.} A trivial solution might be to split the string into $b$-bit blocks and encode each of them into a $(b+1)$-bit block with a simple padding scheme: \tightlist{o} \: For a complete block, output its $b$ data bits followed by a zero. \: For an incomplete final block, output its data bits, followed by a zero and then as many ones as needed to reach $b+1$ bits. \: If the final block is complete (input length is divisible by $b$), we must add an extra padding-only block (zero followed by $b$ ones) to signal the end of the string. \endlist The redundancy of such encoding is at most $n/b + b + 1$ (one bit per block, $b+1$ for extra padding block). For a fixed $b$, this is $\Theta(n)$, so the scheme is not succinct. \subsection{SOLE (Short-Odd-Long-Even) Encoding} \section{Succinct representation of arbitrary-alphabet strings} \endchapter
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!