Notes for Lab 5

The initial value of i inside the parallel region begins with 0 and increases by 10 each iteration. The final value of the main program variable i=1 is unchanged. This indicates that the variable i inside the region is separate. It also indicates that one thread executes all iterations, as i increases. The values of j are as expected for a serial loop.
The final printed values of j increase in multiples of 10, and its initial value is as for the main program variable - indicating it is a shared variable being updated by all threads, the access on a shared variable by multiple threads without protection contains a potential race condition (updates by some threads may get lost). However, the initial value of i in the parallel regions is 0, and the final value is 10. This indicates that the threads have separate copies of i.
The initial value of the thread local variable is now 1 (and not some arbitrary value).
The final value of j with p=14 threads is negative due to integer overflow. With smaller values of p, we see that its final value is 10*p! (p factorial).
With $OMP_DYNAMIC=true, the number of threads in the parallel region is less than or equal to the number of threads requested. The exact number of threads is implementation-defined and may depend upon any number of factors, up to and including your choice of compiler. Otherwise, the number of threads is usually equal to the requested value. See section 10.1.1 in the OpenMP 5.2 spec for details on how OpenMP selects the number of threads to create for a parallel region.

A parallelization of iterations according to a block distribution is:

#pragma omp parallel default(shared) reduction(+:sum) private(i)
  { int id = omp_get_thread_num(), p = omp_get_num_threads();
 int first_i = id * (nele / p);
 int last_i = first_i + nele / p + (id < p-1? 0: nele % p);
 i = first_i;
 do {
   i++;
   sum += i;
 } while (i < last_i);
  }

The problem in parallelizing this loop is there is an inter-iteration dependency on xl. This must be eliminated. One solution is remove this by noting xl = i*wx and making it private.

#pragma omp parallel for default(shared) reduction(+:area) \
  private(xl,xh,fxl,fxh)
  for (i=0; i < nele; i++) {
 xl = i*xw;
 xh = xl + xw;
 fxl = 1.0 / (1.0 + xl*xl);
 fxh = 1.0 / (1.0 + xh*xh);
 area += 0.5 * (fxl + fxh) * xw;
 //xl = xh;
  }

The problem is nxtval() is called within the parallel region and hence may be called concurrently. Thus several threads may be trying to perform icount++. However, this is not executed concurrently and some increments may be lost. Hence there may be duplicate values of mytask being added to sum, and hence the total can be greater than what it should be.
It would seem that simply adding an atomic directive to icount++ would suffice.
```
int nxtval() {
#pragma omp  atomic
  icount++;
  return icount;
}
```
However, the return statement may read a different (greater) value of icount than when the thread accessed previously it. So we need to enforce this, which can be done with a critical directive:
```
int nxtval() {
  int lcount;
#pragma omp critical
  {
 icount++;
 lcount = icount;
  }
  return lcount;
}
```
Or with an atomic capture directive: ```c int nxtval() { int lcount; #pragma omp atomic capture lcount = icount++;

return lcount; } ```