Lapack
In this exercise you are going to compare the textbook Cholesky factorization routine with Lapack implementations of the same algorithm. The Cholesky factorization is the first step when solving Ax=b where A is a symmetric and positive definite matrix.
The LU-factorization of a square matrix A is the product A = LU where L is lower triangular (zeros above the diagonal) and U is upper triangular. When A is symmetric and positive definite it is possible to write A = CCT where C is lower triangular and where T denotes transpose.
One way to write the Cholesky factorization algorithm (using Matlab notation) is shown below. The lower triangle of the original matrix A is replaced by the Cholesky factor C (it is common to replace things like this in order to save space).
n = length(A);
for k=1:n
A(k, k) = sqrt(A(k, k) - sum(A(k, 1:k-1).^2));
for i = k+1:n
A(i, k) = (A(i, k) - sum(A(i, 1:k-1) .* A(k, 1:k-1))) / A(k, k);
end
end
For more details, see (almost) any standard textbook in numerical
analysis.
The factorization is available in Matlab using the command chol
.
Implement the algorithm above (you may replace the sums by loops) and run it on matrices of order 50, 100, 150, ..., 750, 1200 in Fortran or C. How many floating point operations are used (just look at the algorithm above and count them, and express the number in terms of the dimension n)? How many Mflop/s to you get (make a plot using Matlab)?
Do the same thing using the Lapack routine dpotrf
from Lapack
in the following four ways:
Fetch the necessary Lapack-routines (Fortran-versions)
from Netlib (look under Browse).
Use
the (slow) unoptimized BLAS in this first test. Note that it is easy
to fetch just the necessary routines (you do not have to fetch the
whole Lapack-library and all of BLAS).
Comment 2008-04-29: it is
slightly harder this year (and following years) since "Include (unoptimized) BLAS routines?"
does not seem to work any longer. Is used to work however. For that
reason I have put some of the hard-to-find routines on the student
system. Look under /chalmers/groups/thomas_math/HPC/Lapack
. You must still fetch dpotrf (to practice
using Netlib).
Note: this part will be used in the next assignment as well.
To see how important it is with fast BLAS, you should fetch and compile
an excellent BLAS-library GotoBLAS. You must fetch and compile the
package yourself, since I am not allowed to make the library available
to you, according to the license agreement. Start by registering,
(where it says: Not a member yet? click here to register.), then
fetch the tar-file, unpack and compile. When you compile the package
make sure you are not using the Intel Fortran-compiler, ifort. Type make FC=gfortran, for example.
Here is an article about the person, Kazushige Goto, behind the routines.
/usr/lib/liblapack.so
,
that comes
with Linux distribution. What do you think of the
performance? (Note that the performance depends on what Linux-version you are using.)I have tested Intel's MKL as well, but in this case it is not faster than Lapack + GotoBLAS.
Make some nice plots where it is easy to compare all
the tests (so don't plot one curve per page)
A Fortran-hint. Here are a few words about dynamic memory
allocation and matrices in C.
One can use the following theorem to construct a symmetric and positive definite matrix:
Theorem: if A is symmetric and
A(k, k) > |A(k, 1)| + |A(k, 2)| + ... + |A(k, k-1)| + |A(k, k+1)| + ... + |A(k, n)|, k = 1, 2, ..., n,
then A is positive definite.
So taking A(k, k) = 1 and small off-diagonal elements will work (do not take zeros).