Practical details
There are several MPI-systems available. I have fetched and
compiled
the MPICH2-system. This is how you use it:
I have set up the distribution in /chalmers/sw/unsup64/mpich2-1.3.2p1. The binaries are located in the bin-subdirectory. Set
your unix-path so that you get the correct versions of the
binaries. You should, for example, be able to type
which mpif90
and the shell should respond with
/chalmers/sw/unsup64/mpich2-1.3.2p1/bin/mpif90
Note that there are other MPI-distributions installed in the system, but this works best.
You find documentation in the share-subdirectory (and in particular in share/man and share/doc). If you are using a 32-bit system, change all occurrences of unsup64 to unsup above.
To compile your program type one of:
mpicc my_prog.c
or
mpic++
my_prog.cc
or mpicxx
my_prog.cc
or mpif90 my_prog.f90
or mpif77 my_prog.f
To run on your program (you can run more
processes than you have CPUs, but you do not get
true parallel computing of course, nor do you expect the best speedup). In
this example we run two (-n 2) processes:
mpiexec -n 2 ./a.out
The student-machines have two cores, if you want to run on several computers, create a hostfile (named hostfile
in the example) containing a list of names of systems, one name per
row e.g:
host1
host2:4
host3:2
meaning, run one process on host1, four on host2 and two on host3. The Linux-command, hostname,
will print the appropriate name of a system. Make sure that all
participating systems are either 32-bit or 64-bit, do not mix. This is
how you start the seven processes:
mpiexec -f hostfile ./a.out
In the following example-program each process will print out the rank and the name of the host the process is executing on.
#include <stdio.h>
#include <unistd.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
int my_rank;
size_t len = 50;
char host_name[len];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
gethostname(host_name, len);
printf("%2d %s\n", my_rank, host_name);
MPI_Finalize();
return 0;
}
Using the hostfile:
remote1.student.chalmers.se:2
remote2.student.chalmers.se:2
remote3.student.chalmers.se:2
One may get the following printout (I gave the mpiexec-command on remote1). The order may change between runs.
% mpiexec -f hostfile ./a.out
0 remote1.student.chalmers.se
4 remote3.student.chalmers.se
2 remote2.student.chalmers.se
1 remote1.student.chalmers.se
5 remote3.student.chalmers.se
3 remote2.student.chalmers.se
Using gfortran one can use the subroutine hostnm to retrieve the hostname, e.g:
program main
character (len = 60) :: name
call hostnm(name)
print*, name
end
A debug hint:
MPI may be buffering the output from print-statements which means that
you may not see a debug-printout (fairly likely if you have an
incorrect program). In order to see the printout you can flush the
print-buffers.
In Fortran:
print*, 'x = ', x ! for example
call flush ! force printout
In C:
printf("x = %e\n", x); // for example
fflush(stdout); // force printout
