Banded matrices

More on sparse matrices

Sparse matrices (i.e. large matrices containing many zeros) are a very useful tool in applications, e.g when solving PDEs. Special data structures are used to store such matrices. A simple strategy is to store every nonzero, a(j, k), together with its row- and column indices, j and k. There are more compact data structures, however, and Matlab uses a compressed column format. Suppose we want to store the following matrix:

10 0 0 40 0 0 80
A = 0 20 0 0 0 70 90
0 30 0 50 60 0 0

The elements can, of course, be any double precision numbers. In this example I have chosen the nonzeros 1 through 9 to simplify the following discussion. Matlab would store the matrix something like this (some details are not clear from the documentation).

The nonzeros elements are stored, column by column, in an array, a say. An integer array, row, contains the row indices for the nonzeros. A third integer array, col, contains indices (pointers) into a and row, where a column starts. col contains an extra element, pointing to the next element of a and row (had there been a next element). I have made extra spaces to show the break between columns.

a   = 10 20 30    40 50 60 70 80 90
row = 1    2 3 1 3    3    2 1 2
col = 1    2 4   4   6    7    8   10

Here is another example. We would like to store the matrix produced by the Matlab-command:
A = 5 * eye(5) + diag(1:4, 1) . The three arrays are:

a = 5 1 5 2 5 3 5 4 5
row = 1 1 2 2 3 3 4 4 5
col = 1 2 4 6 8 10

If A is an m x n matrix and the number of nonzeros is nnz, the memory requirements, in bytes, are:

memory = 8 * nnz + 4 * (nnz + n + 1)

So, in the example: 8 * 9 + 4 * (9 + 7 + 1) = 140 bytes. This can also be seen using the whos-command:

>> S=sparse([1 0 0 4 0 0 8; 0 2 0 0 0 7 9; 0 3 0 5 6 0 0]);
>> whos
Name Size Bytes Class
S 3x7 140 double array (sparse)

Grand total is 9 elements using 140 bytes

In some cases a and row may have more than nnz elements since some algorithms (e.g. LU-factorization) may have allocated extra space for reasons of efficiency (nzmax(matrix) returns the number of allocated elements). In out examples nnz(S) equals nzmax(S).

Note that the expression, for memory, contains n, the number of columns. If m is much less than n one will save memory by storing the transpose of the matrix instead (if that is an option). Consider:

>> A = sprandn(2, 100000, 0.001);
>> A_transp = A';
>> nnz(A)
ans = 200
>> whos
Name           Size          Bytes Class
A              2x100000     402404 double array (sparse)
A_transp 100000x2            2412 double array (sparse)

This has some consequences when one is writing code.

Speed up the following computation.

Using sparse matrices decreases the memory requirements, but it is harder to write fast code since we usually have to access elements using indirect addressing (pointers). Here is a silly example, look at the times:

>> f = ones(10000, 1); % create a full vector of ones
>> s = sparse(f); % store it in sparse form as well
>> r = []; tic; for k = 1:10000, r = f .* f; end; toc
Elapsed time is 0.228196 seconds.
>> r = []; tic; for k = 1:10000, r = s .* s; end; toc
Elapsed time is 0.580645 seconds.

One would not store a full vector in sparse form, but it does illustrate the overhead caused by the data structure.

Standard algorithms for computing eigenvalues of large sparse matrices (or solving linear systems) are often based on matrix-vector products, y = A * x. As we will see later in the course it takes quite some time to bring the elements of the matrix from the main memory to the floating point unit. So instead of computing A * x where x is a column vector one can compute A * X, where X contains several columns (making it possible to use the matrix elements several times). We talk about block-algorithms (e.g. block-Lanczos, or block-Arnoldi) and the number of columns in X is called the block size. As you will see in the following exercise, the time for multiplying with two columns, say, is less then doing two multiplications with one column each.

Create a sparse matrix by n = 5000; A = spdiags(randn(n, 101), -50:50, n, n); and measure the time it takes to perform Y = A * X, where X = randn(n, bs), bs = 1, 2, ..., 20 (bs for block size). In order to get accurate times, you have to repeat the multiplication some 20 times for each bs. Let time(bs) be the time for one Y = A * X, for block size bs.
Make two plots (in separate figure-windows), one showing time(bs) / time(1) (i.e. normalised time), the other should show (time(bs) / bs) / time(1) (i.e. the normalised time per column). What is optimal block size (just looking at the time and not the convergence properties of the (unknown) algorithm?
Hint: help tic .

Here is a link, I got from David Heintz, discussing the formation of finite element matrices (large and sparse).