Copyright 1998-2001, University of Notre Dame.
Authors: Jeffrey M. Squyres, Arun Rodrigues, and Brian Barrett with
         Kinis L. Meyer, M. D. McNally, and Andrew Lumsdaine

This file is part of the Notre Dame LAM implementation of MPI.

You should have received a copy of the License Agreement for the Notre
Dame LAM implementation of MPI along with the software; see the file
LICENSE.  If not, contact Office of Research, University of Notre
Dame, Notre Dame, IN 46556.

Redistribution and use in source and binary forms, with or without
modification, are permitted subject to the conditions specified in the
LICENSE file.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

Additional copyrights may follow.


Mandelbrot is a simple example of the master/slave parallel
programming technique, written in C.  It runs one master process which
dynamically spawns any number of slaves.  Because the program
dynamically spawns slave processes, you only need to launch the
master.  The master writes the computed image into a Sun rasterfile
formatted file.  Try viewing it with X11/xv.

This application contains some degree of fault tolerance.  Slave
*nodes* can die and the application will continue with less slaves, as
long as one slave is alive.  If an individual slave dies, the entire
process will abort -- this example is aimed at showing that LAM/MPI
can continue if an entire node (including the LAM daemon on that node)
crashes.  To test this, try executing the 'tkill' program on a slave
node while the program is running. This will kill the LAM daemon and
slave process on that node.  (Do not run 'tkill' on the node with the
master.)

Note that this application is only an example, and is not a
full-featured fault-tolerant application.  For example, if a slave
dies, the manager does not contain any extra logic to reassign the
lost work to a different slave.  As such, the resulting output image
may contain a "hole" showing the work that would have been performed
by the dead slave.  Making the manager more robust is an exercise left
for the reader.  :-)

This feature relies on the MPI system reporting errors on MPI
functions whose communicator includes a dead slave.  Since the
application creates a separate communicator for each slave, the master
will know from a returned error which slave has died.  The application
cannot tolerate the untimely death of the master, although this could
be done with mirroring.

Use "make" to compile this example.  Make will use mpicc to compile
both programs:

        mpicc -o master master.c 
        mpicc -o slave slave.c 

To run this program, first boot LAM across your cluster with the
"lamboot" command. Then, you can run the master program on one node
with mpirun:

	mpirun n0 ./master

or you can launch "master" directly without lamboot, since this
program only needs an MPI_COMM_WORLD size of one rank:

	./master

NOTE: This example requires that the executable "slave" be available
on all nodes.
