More on sparse matrices
Sparse matrices (i.e. large matrices containing many zeros) are a very useful tool in applications, e.g when solving PDEs. Special data structures are used to store such matrices. A simple strategy is to store every nonzero, a(j, k), together with its row- and column indices, j and k. There are more compact data structures, however, and Matlab uses a compressed column format. Suppose we want to store the following matrix:
1
0 0 4 0 0 8
A = 0 2
0 0 0 7 9
0
3 0 5 6 0 0
The elements can, of course, be any double precision numbers. In this
example I have chosen the nonzeros 1 through 9 to simplify the
following discussion. Matlab would store the matrix something like this (some details
are not clear from the documentation).
The nonzeros elements are stored, column by column, in an array, a say. An integer array, row, contains the row indices
for the nonzeros. A third integer array, col, contains indices
(pointers) into a and row, where a column starts. col
contains an extra element, pointing to the next element of a and row (had there been a next
element). I have made extra spaces
to show the break between columns.
a = 1 2 3
4 5 6 7 8 9
row = 1 2 3
1 3 3 2 1 2
col = 1
2 4 4 6 7
8 10
Here is another example. We would like to store the matrix produced
by the Matlab-command:
A = 5 * eye(5) + diag(1:4, 1)
. The three arrays are:
a = 5 1 5 2 5 3 5 4 5
row = 1 1 2 2 3 3 4 4 5
col = 1 2 4 6 8 10
If A is an m x n
matrix and the number of nonzeros is nnz, the memory requirements, in
bytes, are:
memory = 8 * nnz + 4 * (nnz + n + 1)
So, in the example: 8 * 9 + 4 * (9 + 7 + 1) = 140 bytes. This can
also be seen using the whos-command:
>> S=sparse([1 0 0 4 0 0
8; 0 2 0 0 0 7 9; 0 3 0 5 6 0 0]);
>> whos
Name
Size Bytes
Class
S
3x7
140 double array
(sparse)
Grand total is 9 elements using
140 bytes
In some cases a and row
may have more than nnz elements since some algorithms (e.g.
LU-factorization) may have allocated extra space for reasons of
efficiency (nzmax(matrix)
returns the number of allocated elements). In out examples nnz(S) equals nzmax(S).
Note that the expression, for memory, contains n, the number of columns. If m is much less than n one will save memory by storing the transpose of the matrix instead (if that is an option). Consider:
>> A = sprandn(2,
100000, 0.001);
>> A_transp = A';
>> nnz(A)
ans = 200
>> whos
Name
Size Bytes
Class
A
2x100000 402404
double array (sparse)
A_transp
100000x2
2412 double array
(sparse)
This has some consequences when one is writing code. Here is a simple example, where we compute the sum of the elements in four different ways.
>> tic; for k = 1:1000,
s = sum(sum(A, 1)); end; toc
Elapsed time is 4.638800 seconds.
>> tic; for k = 1:1000, s =
sum(sum(A, 2)); end; toc
Elapsed time is 0.007592 seconds.
>> tic; for k = 1:1000, s =
sum(sum(A_transp, 1)); end; toc
Elapsed time is 0.008057 seconds.
>> tic; for k = 1:1000, s =
sum(sum(A_transp, 2)); end; toc
Elapsed time is 3.102995 seconds.
|
Formulate some interesting conclusions from the above example and use them to speed up the following computation. |
Using sparse matrices decreases the memory requirements, but it is
harder to write fast code since we usually have to access elements
using indirect addressing (pointers). Here is a silly example, look at
the times:
>> f = ones(10000,
1); % create a full
vector of ones
>> s =
sparse(f); % store it in sparse form as well
>> r = []; tic; for k =
1:10000, r = f .* f; end; toc
Elapsed time is 0.515900 seconds.
>> r = []; tic; for k =
1:10000, r = s .* s; end; toc
Elapsed time is 16.059336 seconds.
One would not store a full vector in sparse form, but it does
illustrate the overhead caused by the data structure.
Standard algorithms for computing eigenvalues of large sparse
matrices (or solving linear systems) are often based on matrix-vector
products, y = A * x. As we will see later in the course it takes quite
some time to bring the elements of the matrix from the main memory to
the floating point unit. So instead of computing A * x where x is a
column vector one can compute A * X, where X contains several columns
(making it possible to use the matrix elements several times). We talk
about block-algorithms (e.g. block-Lanczos, or block-Arnoldi) and the
number of columns in X is called the block size. As you will see in the
following exercise, the time for multiplying with two columns, say, is
less then doing two multiplications with one column each.
|
Create a sparse matrix by n = 5000; A = spdiags(randn(n, 101),
-50:50, n, n); and measure the time it takes to perform Y = A * X, where X = randn(n, bs), bs = 1, 2,
...,
20 (bs for block size). In order to get accurate times, you have to
repeat the
multiplication some 20 times for each bs. Let time(bs) be the time for
one Y = A * X, for block
size bs. Make two plots, one showing time(bs) / time(1) (i.e. normalised time), the other should show (time(bs) / bs) / time(1) (i.e. the normalised time per column). What is optimal block size (just looking at the time and not the convergence properties of the (unknown) algorithm? Hint: etime. |
The matrix in the previous example is a so called band matrix, which
is a special type of sparse matrix. Here is a small example. The blue,
red and green spheres mark elements that are stored (they may be
different of course). The zeros outside the band are not stored. The main diagonal consists of the red
elements. The green elements to the immediate right of the main
diagonal, make up the first super-diagonal.
The blue elements to the immediate left form the first sub-diagonal.
0 |
0 |
0 |
|||
0 |
0 |
||||
0 |
|||||
0 |
|||||
0 |
0 |
The matrix can be symmetric, having the same number of sub- and
super-diagonals. If a matrix has only one sub- and one super-diagonal,
we have a tridiagonal matrix etc. The number of super-diagonals is
called the upper bandwidth (two in the example), and the number of
sub-diagonals is the lower bandwidth (three in the example). The total
number of diagonals, six in the example, is the bandwidth. We can store
a band matrix in a more compact form, than we can a general sparse
matrix.