Content-type: text/html Man page of uprofile

uprofile

Section: User Commands (1)
Index Return to Main Contents
 

NAME

uprofile, kprofile - Profile a program (uprofile) or kernel (kprofile) with Alpha on-chip performance counters  

SYNOPSIS

uprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all|-each|-one] [-stride n] [-display|prof-option...] [statistic...] program [argument...]

kprofile [-v] [-quiet] [-dirname path] [-[no]pids] [-all|-each|-one] [-stride n] [-display|prof-option...] [-k kernel_name] [-t] [statistic...] [program [argument...]]


 

DESCRIPTION

The uprofile command uses the Alpha on-chip performance counters to produce a finely-grained program-counter profile of a user program. The command runs the program you specify with the arguments you specify, collecting the selected statistics on the program's process and its descendants. It writes the profile data to the umon.out file, by default. If the program calls shared libraries, those libraries are not profiled.

The kprofile command uses the Alpha on-chip performance counters to produce a detailed program-counter profile of the kernel. If you specify a program, kprofile runs the program with the arguments you specify, and it collects the selected statistics on the kernel for the duration of the program's execution. If you do not specify a program, kprofile collects the selected statistics on the kernel until you enter Ctrl/C. It writes the profile data to the kmon.out file, by default.

If you specify -display or any of the prof-options, the uprofile and kprofile commands display the profile by runnning the prof tool (with any specified prof-options).

You can also run the prof command separately, to help analyze the data in the umon.out or kmon.out file. The following examples show how to invoke the prof command to analyze data in the respective files: % prof a.out umon.out % prof /vmunix kmon.out


 

OPERANDS

The name of an event that your particular Alpha hardware can profile, as detailed in the STATISTICS section, below. If no statistic is named, machine cycles are counted, giving a CPU-time profile. One statistic can be specified for each of the hardware counters on your machine. The name of the executable to run while profiling operations are being performed. An argument to pass to the program that is run. Multiple arguments can be specified, as needed by the program.
 

OPTIONS

Options can be abbreviated to three characters, except the prof-options, which can be abbreviated (usually to one character) as in a prof command. For example, -qui is interpreted as quiet, but -q is interpreted as -quit. (See the -display option for the supported prof-options.)

For options that specify a procedure name (proc), C++ procedures can omit the argument type list, though this will match all overloaded procedures with that name. To select a specific procedure, specify the full symbol name (as printed by the nm command). Symbol names containing spaces, *, and so on must be quoted. Engages verbose mode, which prints some useful information about the program being profiled. Prevents informational and progress messages from being printed. Specifies the directory path in which the profiling data file or files are created. [Disables] or enables the addition of the process-id number to the name of the profiling data file or files. Specifies which mode to use for profiling on multi-processor machines. Using the -all option (the default) aggregates the data for all CPUs into one umon.out file. Using the -each option collects separate profiles for each CPU and writes the output into a set of files named umon.out.n, where n is the CPU number. Using the -one option profiles only the current CPU. For the -one option to work, the uprofile or kprofile program must be run using the runon command. Sets the granularity of the sample counts, where n is the number of consecutive instructions grouped together for each sample count. The default is -stride 4. The -asm, -heavy, and -lines prof-options need a separate sample count for each instruction (for their reports to be precise enough), so these options imply -stride 1. This makes the output file four times bigger than the default size. The -stride argument must be a power of two (for example, 1, 2, 4, 8). Overrides the name of the kernel to profile. (The default is the booted kernel.) Enables triggered mode for kprofile. This option sets up all required information for running the performance counters, but does not invoke them. See the STATISTICS section for additional information. Runs prof on the resulting profile data file(s). The following prof options are supported: Reports the profile as an annotated disassembly. Excludes procedure proc and its descendants from the profile, but totals all procedures. Excludes procedure proc and its descendants from the profile and from the total. Reports the lines that executed the most instructions. Reports the profile per source line within each procedure. Merges all profile data files into file. Prints each procedure's starting line number. Includes only procedure proc in the profile, but totals all procedures. Includes only procedure proc in the profile and in the total. Profiles the instructions executed in each procedure and the calls to procedures. Truncates the reports after n lines or after (cumulative) n percent of the whole.
 

STATISTICS

You specify the statistics that you want to collect for the program being profiled in one or more statistic operands.

If you specify multiple statistics, uprofile and kprofile accumulate their results. You cannot then view the results of any single statistic separately. Because collected data is merged into a single buffer, interpretation of multiply collected statistics may be difficult.

The Alpha architecture implemented on your machine determines which statistics can be collected and the number of counters available for collecting multiple statistics at the same time. The implementation is indicated by the Alpha chip number, which can be displayed with the show config console command before booting Tru64 UNIX, or, after booting, by using the psrinfo -v command, or by calling getsysinfo (GSI_PROC_TYPE). Also, if the uprofile command is run without arguments, it will show how many counters and what statistics are available on your machine.

All of the chips in the EV4 family (21064 [EV4], 21064A [EV45], 21066/21068 [LCA4]) have two performance counter registers, each of which can be separately programmed. The statistics that each counter can collect are shown in the following table:


Counter0StatsCounter1Stats

0disabled1disabled
issuesdcache
pipedryicache
loadsdualissues
pipefrozenmispredicts
branchesfloatops
cyclesintops
PALcyclesstores
nonissuesnovictims
victims

All of the chips in the EV5 family (21164 [EV5], 21164A [EV56], and 21164PC [PCA56]) have three performance counter registers, each of which can be separately programmed. Some of the counters are common to all EV5 implementations, some are specific to EV5 and EV56, and some are specific to PCA56.

The statistics that each of the common EV5 counters can collect are shown in the following table:


Counter0StatsCounter1StatsCounter2Stats

0disabled1disabled2disabled
cycles0nonissueslongstalls
issuessplitissuepcmispredicts
pipedrybranchmispredicts
replayicachemisses
singleissuesitbmisses
dualissuesdcacheldmisses
tripleissuesdtbmisses
quadissuesldsmerged
flowchangesldureplays
intopsfullreplays
floatopsexternalinput
loadscycles2
storesmemorybarriers
icacheacclockedloads
dcacheacc

The statistics that each of the EV5- and EV56-specific counters can collect are show in the following table:


Counter1StatsCounter2Stats

scacheaccscachemisses
scachereadsscachereadmisses
scachewrites1scachewritemisses
scachevictimscachesharedwrites
bcacherefscachewrites2
bcachevictimbcachemisses
sysreqssysteminvalidates
systemreadrequests

The statistics that each of the PCA56-specific counters can collect are shown in the following table:


Counter1StatsCounter2Stats

bcachereadsbcachedreads
bcachedreadhitsbcachereadhits
bcachedreadfillsbcachereadfills
bcachewritesbcachewritehits
bcachecleanwritehitsbcachewritefills
bcachevictimssysreadflushhits
readmisstwosysreadflushmisses
readmissthree

The EV6 chip has two performance counter registers, each of which can be separately programmed. The statistics that each of the EV6-specific counters can collect are shown in the following table:


Counter0StatsCounter1Stats

0disabled1disabled
cycles0cycles1
retinstretcondbranch
retdtb1miss
retdtb2miss
retitbmiss
retunaltrap
replay

The default is to gather cycle statistics in the 0th counter and to disable other counters.

The EV67 chip has two performance counter registers. The statistics that each of the EV6-specific counters can collect are shown in the following table. Any one statistic may be selected, or one of the listed combinations of statistics may be selected:


Counter0StatsCounter1Stats

0disabled1disabled
cycles0replay
retinstcycles1
retinstbcachemisses

The default is to gather cycle statistics in the 0th counter and to disable other counters.

For descriptions of the statistics for all EV4, EV5, and EV6 implementations, refer to pfm(7).

You can disable any counter by specifying 0disabled, 1disabled, or 2disabled as the counter statistic. You can use this feature to isolate specific event types, such as loads, without extraneous data being generated. You cannot disable all counters at the same time, choose two statistics for the same counter, or disable a counter once its statistic is specified.

When you specify no counter statistics, uprofile and kprofile count cycles on counter 0 by default, and display (through prof) a profile in terms of seconds used by each procedure in the program, except for any shared libraries.

For noncycle statistics, the displayed profile shows the number of samples recorded, the sampling interval (events per second), and the total number of events that this implies. Most non-cycle statistics of the EV5 family CPUs are recorded about six cycles after the instruction that triggered the sample. So, when using prof's -asm or -lines option, the samples should be associated with one of the previously exectuted few instructions of lines. The icacheacc, icachemisses, and dtbmisses statistics are usually attributed precisely.

Because EV6 is an out-of-order machine, precise attribution is much more difficult.

To perform a detailed analysis of short sections of kernel code, use the kprofile command with triggered mode (invoked with the -t option). When you use this mode, kprofile performs all of the required setup for enabling the counters as normal, but does not invoke them. You can insert counter start or stop commands into the kernel code to be instrumented as follows:

Turn counters on: wrperfmon (PFOPT, 1) Turn counters off: wrperfmon (0)

You can turn the counters on and off repeatedly to collect data over many iterations or multiple sections of code.

The macro PFOPT is defined in <sys/pfcntr.h>.
 

NOTES

The interrupt load that profiling places on the system may affect performance, but usually the effect is insignificant.

The kernel in use must have the pfm pseudo-device configured into it. To do this, add the following line to the kernel configuration file and rebuild the kernel:

pseudo-device pfm

The format of the data files produced by uprofile changed in DIGITAL UNIX V4.0 to support improved profile display in terms of the selected statistics. To convert the data files to the industry-standard format, at the expense of losing the names of the statistics, use the pdtostd command.
 

RESTRICTIONS

The EV4 victim and novictim statistics rely on the external performance counter pin connections as described in the EV4 chip specification. The DEC 3000/400, /500, /600, and /800 workstations have these connections. Attempts to display either of these statistics on other platforms (while allowed) will typically generate empty data.

The uprofile command is only supported on EV4 Pass 3 or later processors. Attempts to use it on a Pass 2 processor will gather PC samples for every process running on the system.

Using kprofile to generate statistics for a single command is only possible on EV4 Pass 3 or later processors. Attempts to do this on a Pass 2 processor will gather statistics for the entire system, as if no command had been specified.

Using kprofile with triggered mode also requires an EV4 Pass 3 or later processor and cannot be performed with per-process monitoring.

Only one tool can use the performance counters at a time. A message similar to ``the counter device is busy'' indicates that some other tool is using the performance counters (or has used them but not cleaned up properly). If you are sure no one else is using the performance counters, running uprofile/kprofile with superuser privilege will attempt to reset the busy status and proceed.
 

FILES

The performance counter device file. The statistics file(s) generated by uprofile. The statistics file(s) generated by kprofile. The statistics file(s) generated with the -pids option. The default kernel to profile.
 

SEE ALSO

pdtostd(1), pfm(7), prof(1), runon(1), psrinfo(1), sysconfig(8), autosysconfig(8)

Programmer's Guide


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPERANDS
OPTIONS
STATISTICS
NOTES
RESTRICTIONS
FILES
SEE ALSO

This document was created by man2html, using the manual pages.
Time: 02:43:05 GMT, October 02, 2010