Notes for Lab 5
- The initial value of
iinside the parallel region begins with 0 and increases by 10 each iteration. The final value of the main program variablei=1 is unchanged. This indicates that the variableiinside the region is separate. It also indicates that one thread executes all iterations, asiincreases. The values ofjare as expected for a serial loop. - The final printed values of
jincrease in multiples of 10, and its initial value is as for the main program variable - indicating it is a shared variable being updated by all threads, the access on a shared variable by multiple threads without protection contains a potential race condition (updates by some threads may get lost). However, the initial value ofiin the parallel regions is 0, and the final value is 10. This indicates that the threads have separate copies ofi. - The initial value of the thread local variable is now 1 (and not some arbitrary value).
- The final value of
jwithp=14 threads is negative due to integer overflow. With smaller values ofp, we see that its final value is10*p!(pfactorial). - With
$OMP_DYNAMIC=true, the number of threads in the parallel region is less than or equal to the number of threads requested. The exact number of threads is implementation-defined and may depend upon any number of factors, up to and including your choice of compiler. Otherwise, the number of threads is usually equal to the requested value. See section 10.1.1 in the OpenMP 5.2 spec for details on how OpenMP selects the number of threads to create for a parallel region. - A parallelization of iterations according to a block distribution is:
#pragma omp parallel default(shared) reduction(+:sum) private(i) { int id = omp_get_thread_num(), p = omp_get_num_threads(); int first_i = id * (nele / p); int last_i = first_i + nele / p + (id < p-1? 0: nele % p); i = first_i; do { i++; sum += i; } while (i < last_i); } - The problem in parallelizing this loop is there is an inter-iteration dependency on
xl. This must be eliminated. One solution is remove this by notingxl = i*wxand making it private.#pragma omp parallel for default(shared) reduction(+:area) \ private(xl,xh,fxl,fxh) for (i=0; i < nele; i++) { xl = i*xw; xh = xl + xw; fxl = 1.0 / (1.0 + xl*xl); fxh = 1.0 / (1.0 + xh*xh); area += 0.5 * (fxl + fxh) * xw; //xl = xh; } - The problem is
nxtval()is called within the parallel region and hence may be called concurrently. Thus several threads may be trying to performicount++. However, this is not executed concurrently and some increments may be lost. Hence there may be duplicate values ofmytaskbeing added tosum, and hence the total can be greater than what it should be. - It would seem that simply adding an atomic directive to
icount++would suffice.int nxtval() { #pragma omp atomic icount++; return icount; }However, the return statement may read a different (greater) value of
icountthan when the thread accessed previously it. So we need to enforce this, which can be done with acriticaldirective:int nxtval() { int lcount; #pragma omp critical { icount++; lcount = icount; } return lcount; }Or with an
atomic capturedirective: ```c int nxtval() { int lcount; #pragma omp atomic capture lcount = icount++;
return lcount; } ```