Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
ds2-notes
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Releases
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
datovky
ds2-notes
Commits
974efeab
Commit
974efeab
authored
5 years ago
by
Martin Mareš
Browse files
Options
Downloads
Plain Diff
Merge branch 'master' of gitlab.kam.mff.cuni.cz:mj/dsbook
parents
b63952d4
e901e8a6
Branches
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
05-cache/cache.tex
+7
-5
7 additions, 5 deletions
05-cache/cache.tex
with
7 additions
and
5 deletions
05-cache/cache.tex
+
7
−
5
View file @
974efeab
...
...
@@ -214,9 +214,7 @@ of the block size~$B$. If it is so, we can align the start of the matrix to the
beginning of a~block, so the start of each row will be also aligned. If we set
$
d
=
B
$
,
every tile will be also aligned and each row of the tile will be a~complete block.
If we have enough cache, we can process a~tile in
$
\O
(
B
)
$
I/O operations. As we have
$
\O
(
N
^
2
/
B
+
1
)
$
tiles, the total I/O complexity is
$
\O
(
N
^
2
/
B
+
B
)
$
. As usually, this
can be improved to
$
\O
(
N
^
2
/
B
+
1
)
$
if we realize that the additional term is required only
in cases where the whole matrix is smaller than a~single block.
$
N
^
2
/
B
^
2
$
tiles, the total I/O complexity is
$
\O
(
N
^
2
/
B
)
$
.
For this algorithm to work, the cache must be able to hold two tiles at once. Since each tile
contains
$
B
^
2
$
items, this means
$
M
\ge
2
B
^
2
$
. An~inequality of this kind is usually
...
...
@@ -229,10 +227,14 @@ in the cache and the I/O complexity of our algorithm will not change asymptotica
Now, what if
$
N
$
is not divisible by~
$
B
$
? We lose all alignment, but we will prove
that the algorithm still works. Consider a~
$
B
\times
B
$
tile. In the worst case, each row
spans 2~blocks. So we need
$
2
B
$
I/O operations to read it to cache, which is still
$
\O
(
B
)
$
.
spans 2~blocks. So we need
$
2
B
$
I/O operations to read it
in
to cache, which is still
$
\O
(
B
)
$
.
The cache must contain at least
$
4
B
^
2
$
items, but this is still within limits of our tall-cache
assumption.
To process all
$
\O
(
N
^
2
/
B
^
2
+
1
)
$
tiles, we need
$
\O
(
N
^
2
/
B
+
B
)
$
operations. As usually, this
can be improved to
$
\O
(
N
^
2
/
B
+
1
)
$
if we realize that the additional term is required only
in cases where the whole matrix is smaller than a~single block.
We can conclude that in the cache-aware model, we can transpose a~
$
N
\times
N
$
matrix
in time
$
\Theta
(
N
^
2
)
$
with
$
\O
(
N
^
2
/
B
+
1
)
$
block transfers. This is obviously optimal.
...
...
@@ -274,7 +276,7 @@ whole algorithm finishes in $\O(N^2)$ steps.
To analyze I/O complexity, we focus on the highest level, at which the sub-problems correspond
to tiles from the previous algorithm. Specifically, we will find the smallest~
$
i
$
such that
the sub-problem size
$
d
=
N
^
2
/
i
$
is at most~
$
B
$
. Unless the whole input is small and
$
i
=
0
$
,
the sub-problem size
$
d
=
N
/
2
^
i
$
is at most~
$
B
$
. Unless the whole input is small and
$
i
=
0
$
,
this implies
$
2
d
=
N
/
2
^{
i
-
1
}
> B
$
. Therefore
$
B
/
2
< d
\le
B
$
.
To establish an upper bound on the optimal number of block transfers, we show a~concrete
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment