In your ParallelPi directory, you should see a file named parallelPi.stf. This trace file was created automatically by Intel Trace Collector (ITC) as parallelPi executed. Once again, no modification within the parallelPi.c file was required.
After executing the parallelPi code, enter
Exercise: Do this!
[agopu@bc81 agopu]$ cd ~/MPI_Tutorial/ParallelPi [agopu@bc81 ParallelPi]$ traceanalyzer parallelPi.stf &
In parallelPi the only communication requirement comes at the conclusion of the program. Each processor completes its computation, then waits for processor 0 to gather the results, as shown by the blue arrows. After the MPI_Reduce function call completes, processor 0 normalizes and prints the results.
Do not be alarmed if your timeline shows one or more of the processors starting far ahead of one of more of the others. The parallelPi program has a much shorter run time than roundRobin. This is why there is a few milli-second startup difference between processor 0 when compared to processors 1,2 and 3. ITA exagerates this difference in its timeline because of the short total timespan. For longer, more realistic programs this startup difference is inconsequential.
Once again, follow guidelines provided in Individual Processor activity - ITA Activity Chart to view parallelPi's activity.
A sample ITA activity chart associated with the trace file parallelPi.stf is shown below:
This histogram view provides an overall profile of how each processor spends its time. The bars are in order from left to right as is the function legend from top to bottom. You can immediately see that parallelPi is not an effective parallel program! Far too much of each processor's time is spent on MPI function calls and too little on actually computing and outputting the integral in the application itself (User_Code).
| Previous: Profiling roundRobin using ITA | Up: Table of Contents | Next: More on Tracing parallelPi (ITC) |
|---|