lunar (1) mpy.1.gz

Provided by: yorick-mpy-common_2.2.04+dfsg1-12_all bug

NAME

       mpy - Message Passing Yorick

SYNOPSIS

       mpirun  -np  mp_size mpy [ -j pfile1.i [ -j pfile2.i [ ... ]]] [ -i file1.i [ -i file2.i [
       ... ]]]
       mpirun -np mp_size mpy -batch file.i

DESCRIPTION

       Yorick is an interpreted language like Basic or Lisp, but far faster. See  yorick  (1)  to
       learn more about it.
       Mpy  is  a  parallel  version  of Yorick based on the Message Passing Interface (MPI). The
       exact syntax for launching a parallel job depends on  your  MPI  environment.  It  may  be
       necessary to launch a special daemon before calling mirun or an equivalent command.

   Explanations
       The mpy package interfaces yorick to the MPI parallel programming library.  MPI stands for
       Message Passing Interface; the idea is  to  connect  multiple  instances  of  yorick  that
       communicate among themselves via messages.  Mpy can either perform simple, highly parallel
       tasks as pure interpreted programs, or it can start and steer arbitrarily complex compiled
       packages  which are free to use the compiled MPI API.  The interpreted API is not intended
       to be an MPI wrapper; instead it is stripped to the bare minimum.

       This is version 2 of mpy (released in 2010); it is incompatible  with  version  1  of  mpy
       (released  in  the  mid 1990s), because version 1 had numerous design flaws making it very
       difficult to write programs free of race conditions, and impossible to scale  to  millions
       of  processors.  However, you can run most version 1 mpy programs under version 2 by doing
       mp_include,"mpy1.i" before you mp_include any file defining an mpy1 parallel task (that is
       before any file containg a call to mp_task.)

   Usage notes
       The  MPI  environment  is  not really specified by the standard; existing environments are
       very crude, and strongly favor non-interactive batch jobs.  The  number  of  processes  is
       fixed  before  MPI  begins;  each process has a rank, a number from 0 to one less than the
       number of processes.  You use the rank as an address to send  messages,  and  the  process
       receiving the message can probe to see which ranks have sent messages to it, and of course
       receive those messages.

       A major problem in writing a message  passing  program  is  handling  events  or  messages
       arriving  in  an unplanned order.  MPI guarantees only that a sequence of messages send by
       rank A to rank B will arrive in the order sent.  There is no guarantee about the order  of
       arrival  of  those  messages  relative  to  messages  sent  to  B from a third rank C.  In
       particular, suppose A sends a message to B, then A sends a message to C (or even exchanges
       several  messages  with  C) which results in C sending a message to B.  The message from C
       may arrive at B before the message from A.  An MPI program which does not allow  for  this
       possibility has a bug called a "race condition".  Race conditions may be extremely subtle,
       especially when the number of processes is large.

       The basic mpy interpreted interface consists of two variables:
         mp_size   = number of proccesses
         mp_rank   = rank of this process and four functions:
         mp_send, to, msg;         // send msg to rank "to"
         msg = mp_recv(from);      // receive msg from rank "from"
         ranks = mp_probe(block);  // query senders of pending messages
         mp_exec, string;          // parse and execute string on every rank

       You call mp_exec on rank 0 to start a parallel task.  When the main program  thus  created
       finishes,  all  ranks  other  than  rank  0  return  to an idle loop, waiting for the next
       mp_exec.  Rank 0 picks up the next input line from stdin (that is, waits for input at  its
       prompt  in  an  interactive  session),  or  terminates  all  processes if no more input is
       available in a batch session.

       The mpy package modifies how yorick handles the #include parser directive, and the include
       and require functions.  Namely, if a parallel task is running (that is, a function started
       by mp_exec), these all become collective operations.  That is, rank  0  reads  the  entire
       file  contents,  and  sends  the  contents  to the other processes as an MPI message (like
       mp_exec of the file contents).  Every process other than rank 0  is  only  running  during
       parallel  tasks;  outside a parallel task when only rank 0 is running (and all other ranks
       are waiting for the next mp_exec), the #include directive  and  the  include  and  require
       functions return to their usual serial operation, affecting only rank 0.

       When  mpy  starts,  it  is in parallel mode, so that all the files yorick includes when it
       starts (the files in Y_SITE/i0) are  included  as  collective  operations.   Without  this
       feature,  every  yorick  process would attempt to open and read the startup include files,
       overloading the file system before mpy ever gets started.  Passing the contents  of  these
       files  as  MPI  messages  is  the  only  way to ensure there is enough bandwidth for every
       process to read the contents of a single file.

       The last file included at startup is either the file specified in the  -batch  option,  or
       the  custom.i  file.   To  avoid  problems with code in custom.i which may not be safe for
       parallel execution, mpy does not look for  custom.i,  but  for  custommp.i  instead.   The
       instructions  in  the  -batch  file or in custommp.i are executed in serial mode on rank 0
       only.  Similarly, mpy overrides the usual process_argv function,  so  that  -i  and  other
       command line options are processed only on rank 0 in serial mode.  The intent in all these
       cases is to make the -batch or custommp.i or -i include files execute only on rank  0,  as
       if you had typed them there interactively.  You are free to call mp_exec from any of these
       files to start parallel tasks, but the file itself is serial.

       An additional command line option is added to the usual set:
         mpy -j somefile.i
       includes somefile.i in parallel mode on all ranks (again, -i other.i includes other.i only
       on rank 0 in serial mode).  If there are multiple -j options, the parallel includes happen
       in command line order.  If -j and -i options are mixed, however, all  -j  includes  happen
       before any -i includes.

       As  a  side  effect of the complexity of include functions in mpy, the autoload feature is
       disabled; if your code actually triggers an include by calling an autoloaded function, mpy
       will  halt with an error.  You must explicitly load any functions necessary for a parallel
       tasks using require function calls themselves inside a parallel task.

       The mp_send function can send any numeric yorick array  (types  char,  short,  int,  long,
       float,  double, or complex), or a scalar string value.  The process of sending the message
       via MPI preserves only the number of elements, so mp_recv produces only a scalar value  or
       a 1D array of values, no matter what dimensionality was passed to mp_send.

       The  mp_recv  function  requires  you  to  specify  the  sender of the message you mean to
       receive.  It blocks until a message actually arrives from  that  sender,  queuing  up  any
       messages  from  other  senders  that  may  arrive beforehand.  The queued messages will be
       retrieved it the order received when you  call  mp_recv  for  the  matching  sender.   The
       queuing feature makes it dramatically easier to avoid the simplest types of race condition
       when you are write interpreted parallel programs.

       The mp_probe function returns the list of all the senders of queued messages  (or  nil  if
       the  queue is empty).  Call mp_probe(0) to return immediately, even if the queue is empty.
       Call mp_probe(1) to block if the queue is empty, returning only when at least one  message
       is  available for mp_recv.  Call mp_probe(2) to block until a new message arrives, even if
       some messages are currently available.

       The mp_exec function uses a logarithmic fanout - rank 0 sends  to  F  processes,  each  of
       which  sends  to  F more, and so on, until all processes have the message.  Once a process
       completes all its send operations, it parses and executes the  contents  of  the  message.
       The fanout algorithm reaches N processes in log to the base F of N steps.  The F processes
       rank 0 sends to are ranks 1, 2, 3, ..., F.  In general, the process with rank r  sends  to
       ranks  r*F+1,  r*F+2, ..., r*F+F (when these are less than N-1 for N processes).  This set
       is called the "staff" of rank r.  Ranks with r>0 receive the message  from  rank  (r-1)/F,
       which  is  called the "boss" of r.  The mp_exec call interoperates with the mp_recv queue;
       in other words, messages from a rank other than the boss during an mp_exec fanout will  be
       queued  for  later  retrieval  by mp_recv.  (Without this feature, any parallel task which
       used a message pattern  other  than  logarithmic  fanout  would  be  susceptible  to  race
       conditions.)

       The  logarithmic fanout and its inward equivalent are so useful that mpy provides a couple
       of higher level functions that use the same fanout pattern as mp_exec:
         mp_handout, msg;
         total = mp_handin(value);
       To use mp_handout, rank 0 computes a msg, then all ranks call mp_handout, which sends  msg
       (an  output  on  all ranks other than 0) everywhere by the same fanout as mp_exec.  To use
       mp_handin, every process computes value, then calls mp_handin, which returns  the  sum  of
       their  own  value  and all their staff, so that on rank 0 mp_handin returns the sum of the
       values from every process.

       You can call mp_handin as a function with no arguments to act as a  synchronization;  when
       rank 0 continues after such a call, you know that every other rank has reached that point.
       All parallel tasks (anything started with mp_exec) must finish with a call  to  mp_handin,
       or an equivalent guarantee that all processes have returned to an idle state when the task
       finishes on rank 0.

       You can retrieve or change the fanout parameter F using the mp_nfan function.  The default
       value is 16, which should be reasonable even for very large numbers of processes.

       One  special  parallel  task  is  called mp_connect, which you can use to feed interpreted
       command lines to any single non-0 rank, while all other ranks sit idle.  Rank 0 sits in  a
       loop  reading  the  keyboard and sending the lines to the "connected" rank, which executes
       them, and sends an acknowledgment back to rank 0.  You run the mp_disconnect  function  to
       complete the parallel task and drop back to rank 0.

       Finally,  a  note  about error recovery.  In the event of an error during a parallel task,
       mpy attempts to gracefully exit the mp_exec, so that when rank 0 returns, all other  ranks
       are known to be idle, ready for the next mp_exec.  This procedure will hang forever if any
       one of the processes is in an infinite loop, or otherwise in a state where it  will  never
       call  mp_send,  mp_recv,  or mp_probe, because MPI provides no means to send a signal that
       interrupts all processes.  (This is one of the  ways  in  which  the  MPI  environment  is
       "crude".)   The  rank 0 process is left with the rank of the first process that reported a
       fault, plus a count of the number of processes that faulted for a reason other than  being
       sent  a  message that another rank had faulted.  The first faulting process can enter dbug
       mode via mp_connect; use mp_disconnect or dbexit to drop back to serial mode on rank 0.

   Options
       -j file.i           includes the Yorick source file file.i as mpy starts in parallel  mode
                           on all ranks.  This is equivalent to the mp_include function after mpy
                           has started.

       -i file.i           includes the Yorick source file file.i as mpy starts, in serial  mode.
                           This is equivalent to the #include directive after mpy has started.

       -batch file.i       includes  the Yorick source file file.i as mpy starts, in serial mode.
                           Your customization file custommp.i, if any, is not read,  and  mpy  is
                           placed  in  batch  mode.   Use  the help command on the batch function
                           (help, batch) to find out more about batch mode.  In batch  mode,  all
                           errors  are fatal; normally, mpy will halt execution and wait for more
                           input after an error.

AUTHOR

       David H. Munro, Lawrence Livermore National Laboratory

FILES

       Mpy uses the same files as yorick, except that custom.i is replaced by custommp.i (located
       in /etc/yorick/mpy/ on Debian based systems) and the Y_SITE/i-start/ directory is ignored.

SEE ALSO

       yorick(1)