The Power Method using MPI on a simulated ring
Implement the row-partitioned power method from the lecture on
network
topologies. To simplify the implementation you can assume the following
(although your program should work in the general case with small
changes). Let ß be the block size.
- You can assume that all the blocks are of the same size. Pick #p
and ß and set n = #p · ß (then #p will always divide
n). Start with #p = 2 and ß something small (< 5, say).
- You may assume that A = diag(1, 2, 3, ..., n). This means that
you can store Ap in a vector of length ß. The
matrix-vector multiply is just one loop. The eigenvalues are 1, 2, ..,
n so the eigenvector you are looking for should have its first n-1
elements zero and the last element nonzero. You may not use the
fact that A is diagonal in any other way, so each process should get
the full-length vector each time for example. One should not have to
change the Send/Receive part to handle a non-diagonal matrix.
- Choose a random vector consisting of ones, (1, 1, 1, ..., 1)T.
This simplifies debugging since you always start with the same vector.
It is easy to write a (non-parallel) Matlabprogram producing the
correct results (so you can compare).
- You do not have to implement a termination criterion (just run
10 steps, for example) or compute the eigenvalue.
- I have not shown you how to send a package consisting of several
pieces (e.g. [p, µp, xp] as one package) so
you can send several messages instead. If you do want to send one
message you can have a look at the Pack / Unpack-routines. This
is fairly complicated, it is much easier to do the same
thing yourself, i.e. pack everything in a temporary array and send it,
then unpack at the receiving end. Have a look at this Fortran program (Fortran 77).
- Do not forget that the processors are numbered from 0 to #p-1 in
MPI. In the lecture we numbered them from 1.
Here are a few things your program must do:
- It may only use the following four arrays (vectors): A and t of
length ß, µ (to store the mu-values) of length #p and x of
length n. You may have different names of course and have an arbitrary
number of scalar variables. There are different ways to implement the
communication. You can use a buffer for the Send/Receive-routines as
well.
- It must simulate a ring topology. You may not
use routines which are based on some form of broadcast (I want
you to practice on point to point communication). So processor p may
only communicate with the processors with rank prev (say) and next.
Since the identities (ranks) of the processors are 0, 1, ..., #p - 1
your program can easily compute prev and next given p and #p. (Note:
since we have a ring, if p = 0 then prev = #p-1, next=p+1 for example
and if p=#p-1 then prev = p-1, next = 0).
Fortran or C is OK.
Hint 1
Hint 2
If you implement the routine using the standard MPI_Send /
MPI_Recv
you may get problems. Why? One way to avoid the
problems
is to use
MPI_Sendrecv
.
