Hands-On Session Messaging Fundamentals #4:
Simulating Routing Protocols

Objective: To better understand SF routing algorithms and the SF vs CT routing protocols.
The code for this exercise can be found here. Instructions on how to log into the remote machine and how to download the source code to your working directory (using wget) can be found here.

The one-to-all operation on a hypercube of dimension d can be specified as:
for each i = d-1,...,0, each process whose rank r satisfies r % (2w) = 0 sends the data to process r+w (and correspondingly the processes whose rank r satisfies r % (2w) = w receive the data from process r-w), where w = 2i.

oneAllSFhypercube.c is a program which simulates a one-to-all broadcast communication on a hypercube (with root process 0). It can be invoked by mpirun -np p ./oneAllSFhypercube n, where the number of processes p must be a power of 2, i.e. p=2d, and the message size is specified by n.

  1. Type the command make (to make both executables). Run the program with p=2 (d=1) and various n; you should see that it works. However, it currently only works for d=1: extend it so that it works for arbitrary d and test your code (errors will be reported if the final values in the messsage buffer are incorrect).

  2. Once working, submit the batch file batch_oneAllhyper. It compares the hypercube algorithm with a baseline (a synchronized pipelined broadcast, from the program oneAllFlits.c). The theory predicts the latter should be about p/d times slower (about 4 times for p=16 and 10 times for p=64). What do you observe?

oneAllFlits.c is a program which performs a (series of) pipelined broadcast(s) with root process 0. By default, it is synchronized in the sense that, when the message reaches the final process p-1, process p-1 sends process 0 an empty message so that the next broadcast cannot start before this (this enforces the measurement of end-to-end time for the whole operation). The program can be invoked by mpirun -np p ./oneAllFlits n nf, where nf is the number of flits the message is broken down into (default value 1).

  1. Run the program with various p and n. With nf=1 it should work correctly; with nf≥1 you should see errors reported. Fix the code so that it works for all valid nf≥1.

  2. Once working, submit the batch file batch_oneAllFlits. It compares the synchronized broadcasts of nf=1,8, and that of an unsynchronized broadcast (which could indicate an upper bound on the benefits from using flits). How does the benefit (if any) of breaking the message into flits vary with p and n? What is the reason that you can get a benefit in the first place?