The power method

The Power Method using MPI on a simulated ring

Implement the row-partitioned power method from the lecture on network topologies. To simplify the implementation you can assume the following (although your program should work in the general case with small changes). Let ß be the block size.

You can assume that all the blocks are of the same size. Pick #p and ß and set n = #p · ß (then #p will always divide n). Start with #p = 2 and ß something small (< 5, say).
You may assume that A = diag(1, 2, 3, ..., n). This means that you can store A_p in a vector of length ß. The matrix-vector multiply is just one loop. The eigenvalues are 1, 2, .., n so the eigenvector you are looking for should have its first n-1 elements zero and the last element nonzero. You may not use the fact that A is diagonal in any other way, so each process should get the full-length vector each time for example. One should not have to change the Send/Receive part to handle a non-diagonal matrix.
Choose a random vector consisting of ones, (1, 1, 1, ..., 1)^T. This simplifies debugging since you always start with the same vector. It is easy to write a (non-parallel) Matlabprogram producing the correct results (so you can compare).
You do not have to implement a termination criterion (just run 10 steps, for example) or compute the eigenvalue.
I have not shown you how to send a package consisting of several pieces (e.g. [p, µ_p, x_p] as one package) so you can send several messages instead. If you do want to send one message you can have a look at the Pack / Unpack-routines. This is fairly complicated, it is much easier to do the same thing yourself, i.e. pack everything in a temporary array and send it, then unpack at the receiving end. Have a look at this Fortran program (Fortran 77).
Do not forget that the processors are numbered from 0 to #p-1 in MPI. In the lecture we numbered them from 1.

Here are a few things your program must do:

It may only use the following four arrays (vectors): A and t of length ß, µ (to store the mu-values) of length #p and x of length n. You may have different names of course and have an arbitrary number of scalar variables. There are different ways to implement the communication. You can use a buffer for the Send/Receive-routines as well.
It must simulate a ring topology. You may not use routines which are based on some form of broadcast (I want you to practice on point to point communication). So processor p may only communicate with the processors with rank prev (say) and next. Since the identities (ranks) of the processors are 0, 1, ..., #p - 1 your program can easily compute prev and next given p and #p. (Note: since we have a ring, if p = 0 then prev = #p-1, next=p+1 for example and if p=#p-1 then prev = p-1, next = 0).

Fortran or C is OK.

Hint 1

Hint 2

If you implement the routine using the standard MPI_Send / MPI_Recv you may get problems. Why? One way to avoid the problems is to use MPI_Sendrecv.