Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
datovky
ds2-notes
Commits
b160f3eb
Commit
b160f3eb
authored
Aug 29, 2021
by
Filip Stedronsky
Browse files
Succinct: SOLE intro
parent
c6d1db62
Changes
1
Hide whitespace changes
Inline
Side-by-side
fs-succinct/succinct.tex
View file @
b160f3eb
...
...
@@ -101,11 +101,11 @@ string data itself. This would give us $\O(\log n)$ redundancy almost for free.}
A trivial solution might be to split the string into
$
b
$
-bit blocks and encode
each of them into a
$
(
b
+
1
)
$
-bit block with a simple padding scheme:
\tightlist
{
o
}
\:
For a complete block, output its
$
b
$
data bits followed by a
zero
.
\:
For an incomplete final block, output its data bits, followed by a
zero
and then as many
one
s as needed to reach
$
b
+
1
$
bits.
\:
For a complete block, output its
$
b
$
data bits followed by a
one
.
\:
For an incomplete final block, output its data bits, followed by a
one
and then as many
zero
s as needed to reach
$
b
+
1
$
bits.
\:
If the final block is complete (input length is divisible by
$
b
$
), we must
add an extra padding-only block (
zero
followed by
$
b
$
one
s) to signal the
add an extra padding-only block (
one
followed by
$
b
$
zero
s) to signal the
end of the string.
\endlist
...
...
@@ -115,6 +115,48 @@ scheme is not succinct.
\subsection
{
SOLE (Short-Odd-Long-Even) Encoding
}
In this section we will present a more advanced prefix-free string encoding
that will be succinct.
First, we split the input into
$
b
$
-bit blocks. We will add a padding in the
form of
$
10
\cdots
0
$
at the end of the last block to make it
$
b
$
bits long.
If the last block was complete, we must add an extra padding-only block to
make the padding scheme reversible.
Now we will consider each block as a single character from the alphabet
$
[
B
]
$
,
where
$
B:
=
2
^
b
$
. Then we shall extend this alphabet by adding a special EOF
character. We will add this character at the end of encoding. This gives us
a new string from the alphabet
$
[
B
+
1
]
$
that has length at most
$
n
/
b
+
2
$
(
$
+
1
$
for padding,
$
+
1
$
for added EOF character).
However, as
$
B
+
1
$
is not a power of two, now we have a question of how to
encode this string. Note that this is a special case of the problem stated
above, i.e. encoding a string from an arbitrary alphabet. We will try to solve
this special case as a warm-up and then move on to a fully general solution.
First, we need to introduce a new concept: re-encoding character pairs into
different alphabets. Let's assume for example, that we have two characters from
alphabets [11] and [8], respectivelly. We can turn them into one character from
the alphabet [88] (by the simple transformation of
$
8
x
+
y
$
). We can then split
that character again into two in a different way. For example into two characters
from alphabets [9] and [10]. This can be accomplished by simple division with
remainder: if the original character is
$
z
\in
[
88
]
$
, we transform in into
$
\lfloor
z
/
10
\rfloor
$
and
$
(
z
\;
{
\rm
mod
}
\;
10
)
$
. For example, if we start
with the characters 6 and 5, they first get combined to form
$
6
\cdot
8
+
5
=
53
$
and then split into 5 and 3.
We can think of these two steps as a single transformation that takes
two characters from alphabets [11] and [8] and transforms them into
two characters from alphabets [9] and [10]. More generally, we can
always transform a pair of characters from alphabets
$
[
A
]
$
and
$
[
B
]
$
into a pair from alphabets
$
[
C
]
$
and
$
[
D
]
$
as long as
$
C
\cdot
D
\ge
A
\cdot
B
$
(we need an output universe large enough to hold all
possible input combinations).
We will use this kind of alphabet re-encoding by pair heavily in the SOLE
encoding. The best way to explain the exact scheme is with a diagram:
\section
{
Succinct representation of arbitrary-alphabet strings
}
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment