prof, pixstats - Analyzes profile data
prof [options] [prog_name [PC-sampling_data_file]...]
prof -pixie [options] [prog_name [Addrs_file|Counts_file]...]
prof -pixstats [options] [prog_name [Addrs_file|Counts_file]...]
pixstats [options] [prog_name [Addrs_file|Counts_file]...]
For each prof option, you need to type only enough of the name to distinguish it from the other options. If you do not specify any options, prof uses -procedures by default. Always specify -pixie or -pixstats when you process .Addrs and .Counts files.
The prof command accepts the following options: Causes the profiles for all shared libraries (if any) described in the data file(s) to be displayed, in addition to the profile for the executable. Causes the profiler to print the assembly instructions for each subroutine along with the cycle counts for each instruction. The subroutines are sorted from highest cycle count to lowest. The instructions for each subroutine are printed in order; they are not sorted by cycle count.
Name of the program executable to be profiled. This program
should be compiled with the
option to obtain more complete profiling information. If the
default symbol table level (-g0) has been used, line number
information, static procedure names, and file names are unavailable to the
Name of a profiling data file (default
produced by executing a program that has been linked with the
Name of an instruction-counts file produced by executing a
program that has been instrumented with
pixie. If no
is used if found in the
current working directory.
Name of an instruction-address file produced when the executable
or shared library
is instrumented with
pixie. By default, the path of each
file will be recorded in the
so they do not need to be specified. The order of precedence for finding an
is as follows:
path specified on command line, current directory, directory of object specified
in command line argument, directory where
The prof command analyzes one or more data files generated by the compiler's execution-profiling system and produces a listing. The prof command can also combine those data files or produce a feedback file that lets the optimizer take into account the program's run-time behavior during a subsequent compilation. Profiling is a three-step process: Compile the program Execute the program Run prof to analyze the data.
The compiler system provides two kinds of profiling: Interrupts the program periodically, recording the value of the program counter. Divides the program into blocks delimited by labels, jump instructions, and branch instructions. It counts the number of times each block executes. This provides more detailed (line by line) information than PC-sampling.
The uprofile and kprofile tools provide a third kind of profiling, performance counter sampling. The Alpha architecture on-chip performance counters are used in performance counter sampling.
The following sections describe how to perform the various kinds of
To use PC-sampling, compile your program with the -p option (strictly speaking, it is sufficient to use this option only when linking the program). Then, run the program containing the profiling startup routine that calls monstartup to allocate extra memory to hold the profiling data. If the program terminates normally or calls exit(2), it records the data in a file at the end of execution.
If your program uses shared libraries, note that only its call-shared portion is profiled in detail. Only the total time spent in each shared library is recorded. To individually profile all library routines a program uses, build the program with the -non_shared switch (by default, the compiler produces a call-shared object unless -non_shared is explicitly specified), or set the PROFFLAGS environment variable as described in the Environment Variables section.
After running your program, use prof to analyze the PC-sampling data file. For example:
cc -c myprog.c
cc -p -o myprog myprog.o
myprog (generates mon.out)
prof myprog mon.out
When you use
for PC-sampling, the program name
a.out. The PC-sampling data file name defaults
mon.out; if you specify more than one PC-sampling
reports the sum of the data.
You can use environment variables to change the default PC sampling and profile data collection behavior. The variables are PROFDIR and PROFFLAGS. The general form for setting these variables is: For C shell: setenv varname "value" For Bourne shell: varname = "value"; export varname For Korn shell: export varname = value
In the preceding example, varname can be one of the following: This environment variable causes PC-sampling data files to be generated with unique file names in a specified directory.
You can use the
together. For more information, see the
To use basic-block counting, compile your program without the option -p. Use the pixie program to translate your program into a profiling version and generate a file (prog_name.Addrs) containing block addresses. Then, run the pixie version of the program, which (assuming the program terminates normally or calls exit(2)) will generate a file (prog_name\.Counts) containing block counts.
After running the pixie version of your program, use prof with the -pixie option to analyze the .Addrs and .Counts files. Notice that you must specify the name of your original program, not the name of the .pixie version. For example:
cc -c myprog.c
cc -o myprog myprog.o
pixie myprog (generates myprog.Addrs and myprog.pixie)
myprog.pixie (generates myprog.Counts)
prof -pixie myprog myprog.Addrs myprog.Counts
When you use prof with the -pixie option, the .Addrs file name defaults to prog_name.Addrs, and the .Counts file name defaults to prog_name.Counts. Note that, when the .Counts file name defaults to prog_name.Counts, prof does not attach any path prefix to prog_name, and it looks for the .Counts file in the current working directory. If you specify more than one .Counts file, prof reports the sum of the data.
For each shared library selected for profiling, the prof command searches for an .Addrs file in the following locations if the file location is not explicitly specified on the command line: Current directory Directory in which the object file is located if the location of the object file is explicitly specified on the command line Directory in which pixie created it, as recorded in the .Counts file
For each selected shared library, the
searches for an object file in the following locations:
Directories specified in
Directory in which
found it, as recorded
file, if the
Standard library search directories, as searched by
ld, if the
option is not specified
Use the -pixstats option to get an alternative profile. All options of the previous version of the pixstats(1) command are recognized, for compatibility.
If a disassembly is requested, all basic blocks (or those whose execution count exceeds the -dislimit percentage of total instructions) are disassembled, in increasing address order. Each block is labeled with its procedure name and any offset from the start of the procedure. For each instruction, the relative estimated CPU cycle at which the instruction executes is printed, plus its source line, address, binary code, and assembly language. The total CPU cycles used by one execution of the block, the number of times it was executed, and its percentage of all instructions executed are printed at the end of the block, following any line reporting a non-zero delay caused to a follow-on block.
The main report begins with a record of the command line. This is followed by a summary of the program's behavior: Total CPU cycles used by the profiled objects, plus the equivalent number of seconds Total number of instructions executed Total delay caused by instructions executed in the preceding basic block Total integer and floating-point no-op, arithmetic and logical, logical, shift, load, store, load and store, load followed by load, load and store and fetch (data bus use), load and store relative to the stack or global pointers, floating-point, floating-point compare, conditional branch instructions executed (itemized). Also, total number of branch instructions executed whose target instruction is another branch; and total number of such branches that are estimated to be taken, rather than executing the next instruction in line. Total basic blocks, procedure calls, and branches that skip a single instruction that were executed.
Next, some ratios are printed: Stores : stores + loads Instructions : basic block Instructions : branches Backward branches : branches CPU cycles : procedure calls Instructions : procedure calls Integer no-ops : integer and floating-point no-ops Floating-point no-ops : integer and floating-point no-ops Floating-point pipeline interlocks : floating-point operators
Next, basic blocks are analyzed according to how many instructions they contain. For each size, pixstats reports the execution count, its precentage and cumulative percentage relative to both instructions and basic blocks, the number of instructions contained in blocks of that size, the percentage and cumulative percentage of this relative to all instructions, and the CPU-cycle cost per instruction of blocks of that size. Then, pixstats prints various averages and quartiles of basic block size, plus the largest basic block execution count encountered (to indicate the chance of integer overflow in the analysis).
Next, pixstats analyzes the number of registers (integer and floating-point) that are saved on procedure entry (and restored on exit). It prints the number of procedure entries that save a given number of registers, and the percentage and cumulative percentage of this relative to all procedure entries, all registers saved, and all instructions executed. Finally, it prints some averages and ratios.
The next two tables contain information on the sizes of executed procedures' stack frames and the frequency of execution of each kind of instruction. Frame sizes are reported in ``bits''; for example, 6 bits means a 32- to 48-byte stack frame. The number, percentage, and cumulative percentage of executed calls to procedures with the given frame size is printed. Similarly, the execution count is printed for each machine instruction code, but this table is ordered by decreasing usage.
The next four tables are similar. They provide information about the size of literals used by various categories of Alpha instructions: ADD,SUB,CMP instructions AND,BIC,BIS,XOR,CMOV instructions MUL instructions SHIFT,EXT,INS,MSK,ZAP instructions
(Note that a table may be omitted if there is no use of literals in the program for the particular instruction category). For each of these tables the size of the literal is reported in bits (for example, 4 bits means the literal is greater than or equal to 8 and less than 16).
The next six tables are similar. They contain information on the size of the memory displacement from a base register: LDA displacement from 0 (used like a load immediate instruction) LDAH displacement from 0 (used like a load immediate high) Branch SP-based load/store (load or store within a stack frame) GP-based load/store (load or store within a global offset table) All load or store instructions
Again, the ``size'' of the displacement is reported in bits; for example, 6 bits means a 32 to 63 byte displacement. For both positive displacements (in the ``0-extend'' column) and negative displacements (in the ``1-extend'' column), the execution count is printed along with percentage and cumulative percentage. The summed cumulative percentage is printed last (in the ``Total'' column).
In the ``static'' analysis of instructions, each instruction is counted once per executed basic-block. The ``static'' distribution will be the same as the regular opcode distribution when -nocounts is specified. Following ``static'' totals for instructions and basic blocks, the number and percentage of each instruction code is listed.
The next two tables contain information on how many times each integer and floating-point register was accessed, plus its percentage, ordered by register number. For integer registers, the number and percent of uses as a base register in memory operations is also listed.
prints a flat profile of CPU cycles
used by procedures. This includes the CPU cycles used by the procedure, the
percentage of the total, the cumulative percentage, the number of instructions
executed as part of the procedure, its average number of CPU cycles per instruction,
the number of calls made to the procedure, the average number of CPU cycles
per call, and the procedure name. If
the object and source file names and line number are also printed.
After running the uprofile or kprofile utility to collect profiling data or your program or the kernel, respectively, run prof to examine the resulting mon.out or kmon.out file, as follows: For uprofile output: prof prog_name mon.out For kprofile output: prof /vmunix kmon.out
as for PC sampling, except that only the
executable has a profile. Old performance counter sample data files, generated
on versions of the operating system prior to DIGITAL UNIX Version
4.0, must be analyzed as if they contained PC-sampling data.
The -pixstats option models execution assuming a perfect memory system. Memory system events such as cache misses will increase execution above the -pixstats predictions.
The set of statistics reported by the
and the format of the report are the same as for previous versions of the
command, but note the following:
The labels on disassembled basic blocks take the form
no symbol is available) for an initial block and
for subsequent blocks.
All reported cycles reflect CPU pipeline interlocks, so they
usually do not match the reported instruction counts.
If not all the shared objects used by a program are profiled,
the procedure-call counts may be smaller than the
Normal startup code
Startup code for PC-sampling
Library for PC-sampling
Default PC-sampling data file
Commands: as(1), atom(1), cc(1), gprof(1), dxprof(1). (dxprof(1) is available as an option.)
Functions: monitor(3), profil(2)