Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
ds2-notes
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
datovky
ds2-notes
Commits
b160f3eb
Commit
b160f3eb
authored
3 years ago
by
Filip Stedronsky
Browse files
Options
Downloads
Patches
Plain Diff
Succinct: SOLE intro
parent
c6d1db62
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
fs-succinct/succinct.tex
+46
-4
46 additions, 4 deletions
fs-succinct/succinct.tex
with
46 additions
and
4 deletions
fs-succinct/succinct.tex
+
46
−
4
View file @
b160f3eb
...
...
@@ -101,11 +101,11 @@ string data itself. This would give us $\O(\log n)$ redundancy almost for free.}
A trivial solution might be to split the string into
$
b
$
-bit blocks and encode
each of them into a
$
(
b
+
1
)
$
-bit block with a simple padding scheme:
\tightlist
{
o
}
\:
For a complete block, output its
$
b
$
data bits followed by a
zero
.
\:
For an incomplete final block, output its data bits, followed by a
zero
and then as many
one
s as needed to reach
$
b
+
1
$
bits.
\:
For a complete block, output its
$
b
$
data bits followed by a
one
.
\:
For an incomplete final block, output its data bits, followed by a
one
and then as many
zero
s as needed to reach
$
b
+
1
$
bits.
\:
If the final block is complete (input length is divisible by
$
b
$
), we must
add an extra padding-only block (
zero
followed by
$
b
$
one
s) to signal the
add an extra padding-only block (
one
followed by
$
b
$
zero
s) to signal the
end of the string.
\endlist
...
...
@@ -115,6 +115,48 @@ scheme is not succinct.
\subsection
{
SOLE (Short-Odd-Long-Even) Encoding
}
In this section we will present a more advanced prefix-free string encoding
that will be succinct.
First, we split the input into
$
b
$
-bit blocks. We will add a padding in the
form of
$
10
\cdots
0
$
at the end of the last block to make it
$
b
$
bits long.
If the last block was complete, we must add an extra padding-only block to
make the padding scheme reversible.
Now we will consider each block as a single character from the alphabet
$
[
B
]
$
,
where
$
B:
=
2
^
b
$
. Then we shall extend this alphabet by adding a special EOF
character. We will add this character at the end of encoding. This gives us
a new string from the alphabet
$
[
B
+
1
]
$
that has length at most
$
n
/
b
+
2
$
(
$
+
1
$
for padding,
$
+
1
$
for added EOF character).
However, as
$
B
+
1
$
is not a power of two, now we have a question of how to
encode this string. Note that this is a special case of the problem stated
above, i.e. encoding a string from an arbitrary alphabet. We will try to solve
this special case as a warm-up and then move on to a fully general solution.
First, we need to introduce a new concept: re-encoding character pairs into
different alphabets. Let's assume for example, that we have two characters from
alphabets [11] and [8], respectivelly. We can turn them into one character from
the alphabet [88] (by the simple transformation of
$
8
x
+
y
$
). We can then split
that character again into two in a different way. For example into two characters
from alphabets [9] and [10]. This can be accomplished by simple division with
remainder: if the original character is
$
z
\in
[
88
]
$
, we transform in into
$
\lfloor
z
/
10
\rfloor
$
and
$
(
z
\;
{
\rm
mod
}
\;
10
)
$
. For example, if we start
with the characters 6 and 5, they first get combined to form
$
6
\cdot
8
+
5
=
53
$
and then split into 5 and 3.
We can think of these two steps as a single transformation that takes
two characters from alphabets [11] and [8] and transforms them into
two characters from alphabets [9] and [10]. More generally, we can
always transform a pair of characters from alphabets
$
[
A
]
$
and
$
[
B
]
$
into a pair from alphabets
$
[
C
]
$
and
$
[
D
]
$
as long as
$
C
\cdot
D
\ge
A
\cdot
B
$
(we need an output universe large enough to hold all
possible input combinations).
We will use this kind of alphabet re-encoding by pair heavily in the SOLE
encoding. The best way to explain the exact scheme is with a diagram:
\section
{
Succinct representation of arbitrary-alphabet strings
}
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment