Lapack
In this exercise you are going to compare the textbook Cholesky factorization routine with Lapack implementations of the same algorithm. The Cholesky factorization is the first step when solving Ax=b where A is a symmetric and positive definite matrix.
The LU-factorization of a square matrix A is the product A = LU where L is lower triangular (zeros above the diagonal) and U is upper triangular. When A is symmetric and positive definite it is possible to write A = CCT where C is lower triangular and where T denotes transpose.
One way to write the Cholesky factorization algorithm (using Matlab notation) is shown below. The lower triangle of the original matrix A is replaced by the Cholesky factor C (it is common to replace things like this in order to save space).
n = length(A);
for k=1:n
A(k, k) = sqrt(A(k, k) - sum(A(k, 1:k-1).^2));
for i = k+1:n
A(i, k) = (A(i, k) - sum(A(i, 1:k-1) .* A(k, 1:k-1))) / A(k, k);
end
end
For more details, see (almost) any standard textbook in numerical
analysis.
The factorization is available in Matlab using the command chol
.
Implement the algorithm above (you may replace the sums by loops) and run it on matrices of order 100, 200, ..., 5000 in Fortran or C. How many floating point operations (additions and multiplications) are used (just look at the algorithm above and count them, and express the number in terms of the dimension n)? How many Gflop/s to you get (make a plot using Matlab)?
Do the same thing using the Lapack routine dpotrf
from Lapack
in the following three ways:
Fetch the necessary Lapack-routines (Fortran-versions)
from Netlib
(look under Browse). Find lapack and then click on "Browse, Download
LAPACK routines with on-line documentation browser". Download DPOTRF +
dependencies. Just fetch the necessary lapack-routines (you do not have
to fetch the
whole library.
Use
the (slow) unoptimized BLAS in this first test. So go to Netlib again
and Browse for blas. Fetch the appropriate routines or the whole double
precision library, blas.tgz, which may be easier.
To see how important it is with fast BLAS, you should test
an excellent BLAS-library OpenBLAS. Note that OpenBLAS contains the necessary Lapack-routines as well. Important: when you compare the four methods run OpenBLAS on one to four threads. When you use the student machine, compile for the Sandybridge architecture. Do not use g77 (the Fortran77 compiler).
OpenBLAS is a continuation of the work of Kazushige Goto, who created GotoBLAS2 (which does not support the latest Intel and AMD cpu:s).
/usr/lib64/liblapack.so
that comes
with Linux distribution. (It is sufficient to link with -llapack, you do not have to supply a link-path.) What do you think of the
performance? (Note that the performance depends on what Linux-version you are using.) Make some nice plots where it is easy to compare all
the tests (so do not plot one curve per page). One plot showing time
vs. n for the all the cases, and one showing Gflops vs. n for all the
cases.
A Fortran-hint. See the pdf-file "Introduction to C, Fortran 90, FORTRAN 77, tcsh and bash" under the Diary as well. This pdf-file contains a similar discussion for C.
One can use the following theorem to construct a symmetric and positive definite matrix:
Theorem: if A is symmetric and
A(k, k) > |A(k, 1)| + |A(k, 2)| + ... + |A(k, k-1)| + |A(k, k+1)| + ... + |A(k, n)|, k = 1, 2, ..., n,
then A is positive definite.
So taking A(k, k) = 1 and small off-diagonal elements will work (do not take zeros).