More on sparse matrices
Sparse matrices (i.e. large matrices containing many zeros) are a very useful tool in applications, e.g when solving PDEs. Special data structures are used to store such matrices. A simple strategy is to store every nonzero, a(j, k), together with its row- and column indices, j and k. There are more compact data structures, however, and Matlab uses a compressed column format. Suppose we want to store the following matrix:
10
0 0 40 0 0 80
A =
0 20
0 0 0 70 90
0
30 0 50 60 0 0
The elements can, of course, be any double precision numbers. In this
example I have chosen the nonzeros 1 through 9 to simplify the
following discussion. Matlab would store the matrix something like this
(some details
are not clear from the documentation).
The nonzeros elements are stored, column by column, in an
array, a
say. An integer array, row,
contains the row indices
for the nonzeros. A third integer array, col, contains
indices
(pointers) into a
and row,
where a column starts. col
contains an extra element, pointing to the next element of a and row (had there
been a next
element). I have made extra spaces
to show the break between columns.
a
= 10 20 30
40 50 60 70
80 90
row = 1 2
3
1 3 3 2 1 2
col =
1
2 4
4 6 7
8 10
Here is another example. We would like to store the matrix
produced
by the Matlab-command:
A = 5 * eye(5) +
diag(1:4, 1)
. The three arrays are:
a
= 5 1 5 2 5 3 5 4 5
row = 1 1 2 2 3 3 4
4 5
col = 1 2 4 6 8 10
If A
is an m x n
matrix and the number of nonzeros is nnz, the memory requirements, in
bytes, are:
memory = 8 * nnz + 4 * (nnz
+ n + 1)
So, in the example: 8 * 9 + 4 * (9 + 7 + 1) = 140 bytes. This
can
also be seen using the whos-command:
>>
S=sparse([1 0 0 4 0 0
8; 0 2 0 0 0 7 9; 0 3 0 5 6 0 0]);
>> whos
Name
Size
Bytes
Class
S
3x7
140
double array
(sparse)
Grand total is 9
elements using
140 bytes
In some cases a
and row
may have more than nnz elements since some algorithms (e.g.
LU-factorization) may have allocated extra space for reasons of
efficiency (nzmax(matrix)
returns the number of allocated elements). In out examples nnz(S) equals nzmax(S).
Note that the expression, for memory, contains n, the number of columns. If m is much less than n one will save memory by storing the transpose of the matrix instead (if that is an option). Consider:
>>
A = sprandn(2,
100000, 0.001);
>>
A_transp = A';
>> nnz(A)
ans = 200
>>
whos
Name
Size
Bytes
Class
A
2x100000 402404
double array (sparse)
A_transp
100000x2
2412
double array
(sparse)
This has some consequences when one is writing code.
|
Speed up the following computation. |
Using sparse matrices decreases the memory requirements, but
it is
harder to write fast code since we usually have to access elements
using indirect addressing (pointers). Here is a silly example, look at
the times:
>>
f = ones(10000,
1); %
create a full
vector of ones
>> s =
sparse(f); % store it in sparse form as
well
>> r =
[]; tic; for k =
1:10000, r = f .* f; end; toc
Elapsed time is 0.228196 seconds.
>> r =
[]; tic; for k =
1:10000, r = s .* s; end; toc
Elapsed time
is 0.580645
seconds.
One would not store a full vector in sparse form, but it does
illustrate the overhead caused by the data structure.
Standard algorithms for computing eigenvalues of large sparse
matrices (or solving linear systems) are often based on matrix-vector
products, y = A * x. As we will see later in the course it takes quite
some time to bring the elements of the matrix from the main memory to
the floating point unit. So instead of computing A * x where x is a
column vector one can compute A * X, where X contains several columns
(making it possible to use the matrix elements several times). We talk
about block-algorithms (e.g. block-Lanczos, or block-Arnoldi) and the
number of columns in X is called the block size. As you will see in the
following exercise, the time for multiplying with two columns, say, is
less then doing two multiplications with one column each.
|
Create a
sparse matrix by n
= 5000; A = spdiags(randn(n, 101),
-50:50, n, n); and measure the time it takes to
perform Y = A * X,
where X = randn(n,
bs), bs = 1, 2,
...,
20 (bs for block size). In order to get accurate times, you have to
repeat the
multiplication some 20 times for each bs. Let time(bs) be the time for
one Y = A * X,
for block
size bs. Make two plots (in separate figure-windows), one showing time(bs) / time(1) (i.e. normalised time), the other should show (time(bs) / bs) / time(1) (i.e. the normalised time per column). What is optimal block size (just looking at the time and not the convergence properties of the (unknown) algorithm? Hint: help tic . |