C/C++: Performance Profiling Tools
Sunday, 10 July 2011 14:25

I just finished an initial look at the 'gprof','sprof','valgrind –tool=callgrind', and 'oprofile' performance profiling tools for GNU/Linux. Below are some notes on how to use these tools for C/C++ applications.

gprof

gprof can be used by following the steps below:

  • Add '-pg' to your compile arguments.
    • g++ -O0 -g3 -Wall -c -fmessage-length=0 -pg -o main.o ./main.cpp
  • Add '-pg' to your linker arguments.
    • g++ -pg -o avltree ./main.o
  • Run your recompiled program (e.g. ./avltree).
    • Note: On exit the file gmon.out will be produced.
  • Execute 'gprof ./avltree ./gmon.out' to obtain profiling results
    • Note: its handy to redirect the output of gprof to a file

To use gprof a re-link of the application is required, just rebuilding your projects individual shared libraries is not enough. If you wish to profile only a single shared library you should consider the other tools below.

Some example gprof output,from my 'AVL tree' example code, is below (edited for this post). From the output expensive methods like 'compreString()' can be easily identified, along with frequently called methods such as 'getRight().

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
	...
	...
  9.08      3.48     1.38 75025621     0.00     0.00  compareString(...)
  8.48      4.77     1.29  7257655     0.00     0.00  avltree::recursiveVerify(...)
	...
	...
  5.92      6.57     0.90 13063993     0.00     0.00  avltree::recursiveSearch(...)
  4.90      7.32     0.75 302800659     0.00     0.00  treenode::getRight() const
  4.77      8.04     0.73 88297148     0.00     0.00  treenode::getBalance()
  3.95      8.64     0.60  5806336     0.00     0.00  avltree::recursiveInsert(...)
	...
	...
  3.55      9.76     0.54 266483693     0.00     0.00  treenode::getLeft() const
  2.76     10.18     0.42 26765039     0.00     0.00  treenode::updateHeight()
  2.24     10.52     0.34 101153501     0.00     0.00  treenode::getData() const
	...
	...

Looking further at the grprof output I can also determine that 'compareString' is most often called by 'recursiveSearch', and that compareString is spending most of its time executing the '==' operator.

	...
index % time    self  children    called     name
	...
-----------------------------------------------
                0.12    0.32 6773666/75025621     avltree::recursiveRemove(...) [17]
                0.27    0.69 14488347/75025621     avltree::recursiveInsert(...) [13]
                0.48    1.25 26127811/75025621     avltree::recursiveVerify(...) [9]
                0.51    1.32 27635797/75025621     avltree::recursiveSearch(...) [12]
[6]     32.6    1.38    3.58 75025621         compareString(...) [6]
                0.90    2.10 75025621/75025621      std::operator==(...) [10]
                0.58    0.00 68493747/68493747     bool std::operator< (...) [25]
-----------------------------------------------
	...
sprof

'sprof' is a tool specifically for profiling shared libraries. The tool itself should be pretty easy to use, however from my reading('e.g. Profiling shared library on Linux using sprof '), and my own experience it doesn't produce useful data (see below).

  • export LD_PROFILE=libXXXX.so
  • export LD_PROFILE_OUTPUT=/home/jim/myapp
  • ./command
  • sprof libXXXX.so libXXXX.so.profile
  • Flat profile:
    
    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total
     time   seconds   seconds    calls  us/call  us/call  name
    
    index % time    self  children    called     name
    

valgrind –tool=callgrind

'valgrind -tool=callgrind' is the easiest tool to use, however it is painfully slow! An example run is below:

  • time valgrind --tool=callgrind ./avltree
    real	18m2.602s
    user	17m59.619s
    sys	0m2.512s
    
  • ls callgrind.out.12859
    -rw------- 1 jim jim 149096 Jul 10 09:53 callgrind.out.12859
    
  • callgrind_annotate callgrind.out.12859 > out.txt
    Profile data file 'callgrind.out.12859' (creator: callgrind-3.6.1)
    --------------------------------------------------------------------------------
    I1 cache: 
    D1 cache: 
    LL cache: 
    Timerange: Basic block 0 - 12249111796
    Trigger: Program termination
    Profiled target:  ./avltree (PID 12859, part 1)
    Events recorded:  Ir
    Events shown:     Ir
    Event sort order: Ir
    Thresholds:       99
    Include dirs:     
    User annotated:   
    Auto-annotation:  off
    
    --------------------------------------------------------------------------------
                Ir 
    --------------------------------------------------------------------------------
    50,062,080,936  PROGRAM TOTALS
    
    --------------------------------------------------------------------------------
               Ir  file:function
    --------------------------------------------------------------------------------
    4,391,057,808  ???:std::basic_string<...>::basic_string(...) [... /libstdc++.so.6.0.16]
    3,065,408,806  ../treenode.h:treenode<...>::getBalance() [./avltree]
    ...
    
    
    1,637,305,454  ../main.cpp:compareString(...) [./avltree]
    ...
    

Note: You will see that 'callgrind_annotate' shows how often code is called, and not how much time it takes.

oprofile

'oprofile' by far seems like the best tool, with the only limitation being that you require 'root' access to run it (by default). An example run is below:

  • opcontrol --no-vmlinux
  • opcontrol --separate=lib
    • If the binary spends little time in the main binary image itself then run this command.
  • opcontrol --start
  • Run your executable, for example '/home/jim/avltree/avltree'
  • opreport -l /home/jim/avltree/avltree
    • 	Overflow stats not available
      	CPU: Core 2, speed 2003 MHz (estimated)
      	...
      	samples  %        symbol name
      	44850    12.1150  std::char_traits::compare(...)
      	36391     9.8300  compareString(std::string, std::string)
      	26958     7.2820  bool std::operator< (...)
      	25658     6.9308  ... std::operator==(...)
      	24228     6.5445  avltree::recursiveVerify(...)
      	22579     6.0991  avltree::recursiveSearch(...)
      	20578     5.5586  treenode::getLeft() const
      	19916     5.3798  treenode::getBalance()
      	16931     4.5734  treenode::getRight() const
      	13264     3.5829  avltree::recursiveInsert(...)
      	8823      2.3833  treenode::getData() const
      	8227      2.2223  treenode::updateHeight()
      	
  • opannotate --source /home/jim/avltree/avltree
    • 		...
      		...
      			/* compareString(std::string, std::string) total:  36391  9.8300 */
      	  2180  0.5889 :int compareString(string str1, string str2) { 
      	 11391  3.0770 :	if (str1 == str2) {
      	               :		return 0;
      	 15123  4.0851 :	} else if (str1 < str2) {
      	  2092  0.5651 :		return -1;
      	               :	} else {
      	   400  0.1080 :		return 1;
      	               :	}
      	  5205  1.4060 :}
      		...
      		...
      	
  • opcontrol --shutdown