Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
datovky
ds2-notes
Commits
6b3f2cd4
Commit
6b3f2cd4
authored
Aug 28, 2021
by
Filip Stedronsky
Browse files
Succint: strings intro, naive encoding, practical encoding by groups
parent
dbb01577
Changes
1
Hide whitespace changes
Inline
Side-by-side
fs-succinct/succinct.tex
View file @
6b3f2cd4
...
...
@@ -2,6 +2,7 @@
\input
adsmac.tex
\singlechapter
{
50
}
\fi
\input
tabto.tex
\chapter
[succinct]
{
Space-efficient data structures
}
...
...
@@ -21,13 +22,15 @@ Let us denote $s(n)$ the number of bits needed to store a size-$n$ data structur
The information-theoretical optimum is
$
OPT
(
n
)
:
=
\lceil\log
|X
(
n
)
|
\rceil
$
(which is essentially the entropy of a uniform distribution over
$
X
(
n
)
$
).
\defn
{{
\I
Redundance
}
of a space-efficient data structure is
$
r
(
n
)
:
=
s
(
n
)
-
OPT
(
n
)
$
.
}
Now we can define three classes of data structures based on their fine-grained space
efficiency:
\defn
{
A data structure is
\tightlist
{
o
}
\:
{
\I
implicit
}
when
$
s
(
n
)
\le
OPT
(
n
)
+
\O
(
1
)
$
,
\:
{
\I
succinct
}
when
$
s
(
n
)
\le
OPT
(
n
)
+
{
\rm
o
}
(
OPT
(
n
))
$
,
\:
{
\I
implicit
}
when
$
s
(
n
)
\le
OPT
(
n
)
+
\O
(
1
)
$
,
\tabto
{
7.6cm
}
i.e.,
$
r
(
n
)
=
O
(
1
)
$
,
\:
{
\I
succinct
}
when
$
s
(
n
)
\le
OPT
(
n
)
+
{
\rm
o
}
(
OPT
(
n
))
$
,
\tabto
{
7.6cm
}
i.e.,
$
r
(
n
)
=
{
\rm
o
}
(
OPT
(
n
))
$
,
\:
{
\I
compact
}
when
$
s
(
n
)
\le
\O
(
OPT
(
n
))
$
.
\endlist
}
...
...
@@ -47,6 +50,36 @@ fast operations on these space-efficient data structures.
\section
{
Succinct representation of strings
}
Let us consider the problem of representing a length-
$
n
$
string over alphabet
$
[
m
]
$
,
for example a string of base-10 digits. The following two naive approaches immediately
come to mind:
\list
{
(a)
}
\:
Consider the whole string as one base-10 number and convert that number into binary.
This achieves the information-theoretically optimum size of
$
OPT
(
n
)
=
\lceil
n
\log
10
\rceil
\approx
3
.
32
n
=
\Theta
(
n
+
1
)
$
. However, this representation does not support local decoding and
modification -- you must always decode and re-encode the whole string.
\:
Store the string digit-by-digit. This uses space
$
n
\lceil
\log
10
\rceil
=
4
n
=
OPT
(
n
)
+
\Theta
(
n
)
$
.
For a fixed alphabet size, this is not succinct because
$
\Theta
(
n
)
> o
(
OPT
(
n
))
=
o
(
n
+
1
)
$
\foot
{
More
formally, if we consider
$
\O
$
and
$
o
$
to be sets of functions,
$
\Theta
(
n
)
\cap
o
(
n
+
1
)
=
\emptyset
$
.
}
.
However, we get constant-time local decoding and modification for free.
\endlist
We would like to get the best of both worlds -- achieve close-to-optimum space
requirements while also supporting constant-time local decoding and modification.
A simple solution that may work in practice is to encode the digits in groups
(e.g. encode each 2 subsequent digits into one number from the range [100] and
convert that number to binary).
With groups of size
$
k
$
, we get
$$
s
(
n
)
=
\lceil
n
/
k
\rceil
\lceil
k
\log
10
\rceil
\le
(
n
/
k
+
1
)(
k
\log
10
+
1
)
=
\underbrace
{
n
\log
10
}_{
OPT
(
n
)
}
+
n
/
k
+
\underbrace
{
k
\log
10
+
1
}_{
\O
(
1
)
}
.
$$
Thus we see that with increasing
$
k
$
,
redundance goes down, approaching the optimum but never quite reaching it. For a
fixed
$
k
$
it is still linear and thus our scheme is not succinct. Also, with
increasing
$
k
$
, local access time goes up. In practice, however, one could
chose a good-compromise value for
$
k
$
and happily use such a scheme.
\endchapter
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment