For the preceeding traces of roundRobin and parallelPi, the source code was used unmodified. All that was needed to use ITA was to make sure the linker accessed the libVT.a library at compile and linking time. Although this simple approach is helpful, it does not make use of the powerful tracing instrumentation capabilities of ITC.
The basic instrumentation of the ITC API provides the user with several routines that control profiling and record application-specific activities. In addition to profiling of the MPI function calls, this instrumentation permits profiling of loops, user function calls, and other portions of the user's source code.
Two simple routines need to be inserted into the user's source code to identify locations that need to be traced. These routines are:
int VT_begin (int code)
/* Code to be traced goes in between */
int VT_end (int code)
Here the integer code is a user-defined identifier for a particular state and activity. The integer return value is negative in case of an error.The user must specify the states and activities, mentioned above, with:
int VT_symdef(int code, char *state, char *activity)
Again the integer return value is negative in case of an error.The state character string names a specific loop or function, or any other clearly identified state of the executing program. The activity character string refers to the activity class of which the specific state is a member of. For instance, a specific call to printf (to print results) may be given the state name print-results and the activity name I/O.
The ITC API's predefined activities are:
Some of the predefined states and the activity class, they belong to, are shown below:
| State | Activity class |
|---|---|
| Various MPI function calls | MPI |
| User_Code | Application |
| TRACE_ON | VT_API |
We will not discuss the Highlight and Idle activity classes in this tutorial.
The modified parallelPi.c, with ITC instrumentation, is as follows. Note the macro USE_VT which determines whether or not the compilation of parallelPiVt.c includes the Intel Trace Collector (a.k.a VampirTrace) instrumentation.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mpi.h>
#define USE_VT 1 /* 1 to call Intel Trace Collector API, 0 otherwise */
#if USE_VT
#include "VT.h"
#endif
const double pi = 3.1415926535897932385;
int main( int argc, char *argv[] )
{
/* This is a parallelization of serialPi.c. */
/* The for-loop is partitioned among the processors. */
/* Function MPI_Reduce sums the processors' results for processor 0. */
/* Processor 0 outputs. */
/* */
/* Remember, it is convenient to reference the typical processor in the */
/* first person, using "I" and "me" and "my." In other words, */
/* read through this program as if you were one of the processors. */
/* Arguments required for executing argv[0]: */
/* 0 -> serialPi */
/* 1 -> numberOfIntervals */
/* My ID number and the total number of processors: */
int myrank, numProcs;
/* Variables for the computation. */
int numberOfIntervals, interval;
double intervalWidth, intervalMidPoint, totalArea, myArea = 0.0;
/* For completeness I'll declare an MPI status variable, */
/* although I don't plan to use it. */
MPI_Status status;
/* Initialize the MPI API. */
/* But don't call any Intel Trace Collector API routines before MPI_Init() */
/* or after MPI_Finalize(). */
#if USE_VT
VT_traceoff(); /* Don't want to trace MPI_Init() */
#endif
MPI_Init(&argc, &argv);
#if USE_VT
/* Now setup the Intel Trace Collector instrumentation. This involves defining */
/* source code locations for beginning and ending events. */
/* First define state code 200 associated with state "for-loop" and */
/* activity "Calculation". */
VT_symdef( 200, "for-loop", "Calculation" );
/* and define location code 201 associated with state "print" and */
/* activity "I/O". */
VT_symdef( 201, "print", "I/O" );
/* See below how we place the routines VT_begin(int code) and */
/* VT_end(int code) to see what the states and activities are. */
VT_traceon(); /* Switch tracing on */
#endif
/* Request my ID number: */
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* I'll also ask how many other processors are out there: */
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
/* Okay. The preparations have been made. */
/* The number of intervals is a command line argument. */
numberOfIntervals = atoi( argv[1] );
/* Compute the interval width. */
intervalWidth = 1.0 / numberOfIntervals;
/* Now I'll compute my area. */
#if USE_VT
VT_begin(200); /* Start tracing the for-loop. */
#endif
for ( interval = myrank; interval < numberOfIntervals; interval += numProcs )
{
intervalMidPoint = (interval + 0.5) * intervalWidth;
myArea += 4.0 / ( 1.0 + intervalMidPoint*intervalMidPoint );
}
#if USE_VT
VT_end(200); /* Stop tracing the for-loop. */
#endif
/* Okay. Now I'll submit my results to the MPI API. */
/* If I'm processor 0, I'll receive the total area. */
/* Otherwise, I'm done. */
/* No user-defined tracing is needed here. Intel Trace Collector automatically */
/* traces MPI function calls. */
MPI_Reduce( &myArea,
&totalArea,
1,
MPI_DOUBLE,
MPI_SUM,
0,
MPI_COMM_WORLD );
/* If I'm processor 0, I need to normalize the total area and output. */
if (myrank == 0)
{
totalArea *= intervalWidth;
#if USE_VT
VT_begin(201); /* Start tracing printing. */
#endif
printf( "The computed value of the integral is %.15f\n", totalArea );
printf( "The exact value of the integral is %.15f\n", pi );
#if USE_VT
VT_end(201); /* Stop tracing printing. */
#endif
}
/* Close the MIP API: */
MPI_Finalize();
return 0;
}
In the following section, we will see what the ITA timelines and activity charts look like, with the added instrumentation code.
| Previous: Profiling parallelPi using ITA | Up: Table of Contents | Next: Viewing parallelPi's ITC Instrumented trace |
|---|