Changes for QMP Version 2

Modified: Nov 2, 2004

I.              API Clarifications

a.     Sending from / to overlapping buffers

Sending from overlapping buffers is allowed. Sending to overlapping buffers simultaneously from one or more nodes is undefined (considered an undetected error).

b.    Starting a second send (separate handle) to the same adjacent node before the first completes, or a second receive before the first complete, is allowed.

The second send/receive start function is allowed to block.  That is, the implementation is not required to implement an I/O queue to support this behavior.

c.     Starting a handle twice before the first operation completes is undefined.

d.    Disallow mixing single operations and multiple operations containing those single operations.  Example:

hnd[0] = QMP_declare_send_relative(...);
hnd[1] = QMP_declare_send_relative(...);
big = QMP_declare_multiple(hnd, 2);
....
QMP_start(hnd[0]);
....
QMP_start(hnd[1]);
....
QMP_wait(big);

Justification: mixed operations don’t allow full optimization of multiple operation.

This mixed operation was proposed to optimize code behavior in waiting for operations to complete in any order.  This will now be addressed by a new function (below).

e.     Use of declare_multiple invalidates the individual I/O handles (thus making mixed mode impossible) and subsequent operations on the individual I/O handles are undefined.  declare_multiple will free the individual I/O handles (user will not have to do this)

f.     Recursive use of declare_multiple is allowed (but invalidates the inputs) and the ordering of individual messages if multiple messages are going to / coming from a single destination

g.     For QMP_declare_logical_topology() add a clarification on what is legal:

"The implementation should re-order dimensions and support logical machines with a number of dimensions different from the allocated machine if the hardware can support it.  If re-ordering of dimensions occurs, the implementation must do the appropriate mapping for send relative operations.  On the QCDOC it is expected that only the logical machine declared in the environment will give success on this call (sizes must match exactly)."

 

 

h.    Eliminate use of types like QMP_u32_t in the spec except for needed opaque types.

i.      Change global_xor to xor_ulong(ulong * val).

j.      Use other standard types where appropriate (size_t for block size, off_t for stride)

k.    stride is allowed to be negative; stride is the difference is block starting addresses

                     (QCDOC to confirm they can live with this, possibly with software mapping)

l.      Drop use of QMP_SMP_ONE_ADDRESS and QMP_SMP_MULTIPLE_ADDRESS in QMP_init_msg_passing. Should know this from the environment: MPI_run specifies how many processes per node to start.

Follow MPI_init_thread:

Use QMP_thread_single / funneled / serialized / multi

2 parameters: requested, provided thread level

m.   Clarify requirements for valid global reduction functions: must be associative and implied function arg ordering is node rank.

 

II.            API Modifications

a.     Add missing array length argument:

QMP_msgmem_t QMP_declare_strided_array_msgmem (void ** base,               QMP_u32_t* blksize, QMP_u32_t *nblocks, QMP_u32_t* stride);

change to

QMP_msgmem_t QMP_declare_strided_array_msgmem (void ** base, size_t * blksize, int *nblocks, ptrdiffs_t* stride, int n);

III.          API Additions

a.     add routine QMP_wait_all (handle *, int len)

This routine provides the requested capability to wait for I/O on all of the input handles without a deadlock.  (do we need wait_any ?)

b.    add routine QMP_abort()

void QMP_abort(int exit_code)

void QMP_abort_string(int exit_code, char* string)

and the C++ has an null defaulted 2nd arg which is a string.  This routine kills all processes, used to abnormally terminate a job; like exit() but attempts some cleanup.

c.     add routine to specify additional application alignment requirements:

void *QMP_allocate_memory (size_t nb);  (replaces old allocate_aligned)

void *QMP_allocate_aligned_memory (size_t nb, int min_align, int flags);

if min_align = QMP_DEFAULT_MEMALIGN, it uses the default; specified in bytes

     flags = bool for type of memory, etc. QMP_DEFAULT_MEMFLAGS

     (bits to be defined later)

d.    add routine QMP_verbose() to set new (return old) verbosity level for implementations messages to stderr

   int  QMP_verbose(int)     
      0 = no messages
      1 = only a few terse messages
      4 = most verbose, lots of diagnostics

(current implementation is QMP_verbose(bool) with void return)

e.     add convenience routines:

   int         QMP_printf  (const char* format,...);
   int         QMP_fprintf  (FILE* stream, const char* format,...);
   int         QMP_info  (int verbose, const char* format,...);    (to stdout)
   int         QMP_error  (const char* format,...);                      (to stderr)

{
   void      QMP_error_exit  (const char* format,...);
                                       Reports error messages and quits.
   void      QMP_fatal  (QMP_u32_t rank, const char* format,...);

}

These routines all tag output messages with node number, and total number of nodes, and hostname.

     “(node#): the_string”

f.     error handler function: ???

        QMP_errfuncptr_t  QMP_set_error_function (QMP_errfuncptr_t funcptr);

??? this function is called anytime an error is detected, prior to returning the error to the caller;

   QMP_status function (int code);

g.     add convenience routine

QMP_bool_t  QMP_is_primary_node  (void);

IV.          Implementation Changes

a.     drop support for QMP-GM

Justification: the performance advantages do not justify the cost of finishing the debugging of this implementation, because QMP-MPICH can support GM (myrinet).  We no longer anticipate that GM will be a significant platform for QMP.  Instead, the major platforms will be:

                                              i.     MPICH (portability, support of legacy platforms, including myrinet, and initial support of Infiniband)

                                             ii.     gigE mesh (since this cannot be supported by MPICH)

                                           iii.     QCDOC (ditto)

                                           iv.     Infiniband, if it appears that we can gain significant advantages over MPICH by such an implementation

b.    Finish QMP-MPICH and QMP-VIA and QMP-QCDOC

c.     Add a new implementation to support running on a single node without using QMP-MPICH (which requires an MPICH implementation) -- i.e. satisfy all references, but support no messaging.

d.    explicit support for profiling for all implementation:

                                              i.     Files QMP_P_MPI.h and QMP_P_GM.h (and others):
    Insert, near the top, a line with   #include "QMP_profiling.h"
    (conditioned by ifdef???)

                                             ii.     Add to the set of header files the file "QMP_profiling.h" which
    has contents such as:

    #pragma weak QMP_start = PQMP_start
    #undef QMP_start
    #define QMP_start PQMP_start

    This should be repeated for each QMP function of interest.
    It will enable profiling tools to define 'QMP_start' and let
    the tools do whatever they want to do before calling the real
    library function, redefined as 'PQMP_start'

                                           iii.     Add to the library the dummy function "int QMP_profcontrol (int level)" to set level and return previous level; 0=off, 1=on

                                           iv.     add double QMP_get_total_qmp_time(void); which would return the total time spent in QMP.  This could just return 0 if the QMP isn't profiled but would return the correct result for a profiled library.  This could return time in seconds or we could use an int return for the time in some fraction of a sec. Maybe also a void QMP_reset_total_qmp_time(void);