Hands-On Session Messaging Fundamentals #3:
MPI Message Performance
Objective: To better understand message passing performance.
The code for this exercise can be
found here. Instructions on how to
log into the remote machine and how to download the source code
to your working directory (using wget) can be found
here.
mpiPingPong.c is a basic pingpong code. It times each of
100 repetitions of ping-pong operations on a buffer of 64 ints,
printing out the minimum and maximum timing.
- Run the code and make sure it works. After doing so several
times, do you observe a potential problem for timing this operation?
Hint: the overhead and resolution of the MPI timing
function used by the program, MPI_Wtime() on the current
system is approximately 1e-06s and 0.25e-06s respectively.
Rectify the timing problem (hint: measure the time
over all repetitions instead).
- Currently the code only does pingpong between processes 0 and 1
for a message containing 64 integers and measures the time using
MPI_Wtime(). Modify the code so that it runs with a
message length len from 1
to 4*1024*1024 integers in powers of 4
(i.e. 1, 4, 16, 64, 256, 1024, ...).
Have the code print out the average time and the corresponding bandwidth
(on process 0). As always, test the code interactively first.
Are the results what you expected?
- What latency did you measure and what peak bandwidth? How does
the bandwidth change with message length? Does the latency seem to
fit a straight line?
- Further modify the code so that it measures the pingpong time
between process 0 and all other processes in
MPI_COMM_WORLD for messages of 1, 1024 and
1048576 integers.
- Run your code on the batch system using 32 CPUs and complete the
following table:
Message Size (ints) |
time for pingpong between two processes |
within a node | between two nodes |
1 | | |
1024 | | |
1048576 | | |
- What results did you expect to see? Are the results in line
with these expectations? If not why not?