## Hands-On Session Advanced Messaging #3: Collective Communication Algorithms

Objective: To better understand binary tree collective communication algorithms.
The code for this exercise can be found here.

The program global_sum.c generates a single random integer on each process and then seeks to sum the values together across all processes using a binary tree. The code works if the number of processes used, nproc, is a power of two, but fails otherwise. Inspect and run the code. Make sure you understand how it works.

1. Fix the code so that it works correctly even when the number of processes used is not a power of two. Hint: consider cutting off at nproc the tree for the next power of two.

2. (skip initially and complete later, if there is time)
The current program gives the value of the final sum only on process 0. Modify the code so that it gives the final value on any process that you chose. Have the rank of the chosen process passed to the summation function. For the purpose of this exercise make that process the one with a rank of nproc/2.

The program global_gather.c is similar to global_sum.c in that it has multiple processes that each generates a single integer of random value. However, instead of adding these values together we now wish to create a list of all these values on each process. This is a gather operation, and has been implemented in the code using a single MPI_Allgather(). Inspect and run the code. Make sure you understand what it is doing.

1. Re-write global_gather.c, replacing the call to MPI_Allgather() with a call to a function that you write that only uses MPI point-to-point send and receive calls.
You can base your function on the global_sum() function in global_sum.c. But now instead of summing integers as you go up the tree, you concatenate them in a list. Note that the amount of data sent doubles as you go up the tree. To simplify the problem you may assume that the number of processes is an integer power of 2. There are two approaches.
• The rightmost of an active pair sends to the leftmost, and then drops out. So far, this is like the global sum except we are concatenating at each point, not summing. The root eventually has the whole list and then broadcasts the completed list back down the tree.
• recursive doubling. The pairs of processes exchange data at each level, and the number of active pairs stays constant. At the final step, all processes should have the complete list.