Profiling

From Free Pascal wiki
Jump to: navigation, search

English (en) français (fr) русский (ru)

Profiling is a performance analysis technique that can be used to find bottlenecks in your application. In particular it encompasses measuring the frequency and duration of function calls.

Profiling and optimization

Profiling can help to identify “critical” parts of your code, whose optimization could seriously improve the performance of your program. It is done best at times when you are preparing for a release and having a feature-freeze for your code. It is a waste of time and resources to optimize something that will be thrown away the next day during some code restructuring.

Be careful not to over-optimize your program. It is seldom worth it to spent days on getting only a 2% performance gain. It is mostly cutting down run-time with 30% in just half an hour of coding a few optimizations that will pay off.

Of course optimizing code that is reused very often is worth a little more effort for performance gain.

FreePascal profiler support

FreePascal has built-in support for generating executables suitable for (at least) two profilers:

  • gprof, the Gnu Profiler: enable with -pg.
    • Availability: Most FPC target platforms
    • Notes:
      • Support for using gprof under Linux is broken in FPC 2.2.0-2.4.0; it is fixed in FPC 2.6+
      • Under Windows, additional libraries are required to get gprof support working;
  • Valgrind, using the plug-in Callgrind: enable with -gv.
    • Availablility: Linux, Mac OS X 10.5 (10.6 in Valgrind development versions), and some other UNIXes.
    • Notes
      • Should work fine on all platforms supported by Valgrind, as it does not require any compiler support (other than that the profiled program should be dynamically linked, and that the compiler should not generate certain kinds of debug information -- both issues are taken care of by the -gv option)
      • The -gv option implicitly includes the CMem unit; thus you need to remove it from your uses clause

There's also an FPC native profiler project called FPProfiler. Because the profiler is native to FPC it should work on all platforms that FPC supports.

A Lazarus IDE addon which enables you to profile your project with one click is also available: LazProfiler

Using Valgrind/Callgrind

Valgrind was originally only a memory debugger, but now it also supports e.g. profiling using the Callgrind plug-in (which has replaced the Calltree plugin). To install Valgrind with Callgrind on Debian, run e.g.

 aptitude install valgrind valgrind-callgrind

To profile your program, you need to do three steps

  • Compile your program with the -gv switch.
  • Run your program using valgrind
  • Analyze the output with valgrind_annotate or the GUI tool KCacheGrind

Compile your program with

 fpc -gv myprogram.pas

Run Valgrind using e.g.

 valgrind --tool=callgrind ./myprogram --options --to --my --program

You can also profile only specific parts of your program. For example you can start the program with

 valgrind --instr-atstart=no ./myprogram --options --to --my --program

At the moment you want actual profiling to start, run

 callgrind_control -i on

To stop the profiling, run the same command with off instead of on. To get only profiling information when you are inside a particular function, you can pass the --toggle-collect=funcNamePrefix option to valgrind. Refer to the Valgrind documentation for further details.

Callgrind has created a trace file with a typical name like callgrind.out.12345. To get a human-readable statistical report, run e.g.

 callgrind_annotate callgrind.out.12345

Of course you can also experiment with the other tool plug-ins that ship with Valgrind. Cachegrind, for example, profiles the cache use. Other plug-ins can be used to profile or debug memory allocation or detect data race conditions.

Using KCacheGrind

Though the output of callgrind_annotate can definitely shed some light on the performance characteristics of your program, the GUI tool KCacheGrind can give you even more interesting information, presented in different forms: caller/callee lists, caller/callee maps ans call graphs. To install it on Debian, run

 aptitude install kcachegrind graphviz

(The graphviz package is required if you want KCacheGrind to generate call graphs.)

Example of KCacheGrind showing profiling information for Lazarus (Click to enlarge)

Run KCachegrind with a simple call to kcachegrind. You can now open a trace file using the GUI and all should be self-explanatory.

On the left, you will see a 'flat profile', which list for each function (in the top 100)

  • Incl: amount of time spent in the function, including callees (functions called by the function).
  • Self: amount of time spent within the function itself, excluding callees.
  • Called: the number of the times the function has been called.

Note: if you click on one of the column headers to sort the list, you might need to do a reload (F5) to ensure that the top 100 with respect to that column will be shown. Another hint: in the configuration options, you can make the list longer than only the top 100.

On the right, you'll see a number of tabs, allowing you to see lists and maps of callers and callees, as well as a caller graph, which is very useful to see in a glance who calls whom the most.

Memory Profiling

Heaptrc, LineInfo

To spot memory leaks, simply compile with -gh. This will compile as if you added the heaptrc unit as first unit to the uses section of your program. Heaptrc links into the memory manager of fpc and checks various things. Together with the -gl option (adding the lineinfo unit) it will give you useful information and backtraces, where a memory block was allocated. It supports some options that can be controlled via environment option HEAPTRC containing keywords:

  • keepreleased: Any freed memory, is kept and checked if overwritten.
  • disabled: disable tracing. This is useful for program compiled with heaptrc and should work normally.
  • nohalt: normally heaptrc stops on the first error it finds. Add this keyword if the program should continue (as it would normally do without heaptrc).
  • haltonnotreleased: Normally heaptrc prints for every not freed block a backtrace, which can take a long time. Add this keyword to stop after first reported mem leak.
  • log=<filename>: Instead of writing to stdout, you can give a log filename.

Note: heaptrc will conflict with cmem if you have to debug external libraries. Valgrind should be used in this case.

Valgrind Memcheck

There are mem leaks which heaptrc does not find. The valgrind program has a tool called memcheck, which does a lot more bookkeeping than heaptrc and finds some more bugs. The drawback is that valgrind is snail slow, so do this only if heaptrc could not help you. Here is an example:

 valgrind --tool=memcheck --leak-check=summary --log-file=log.txt ./project1

The log.txt will almost always find something, because it tries to check the used external libraries too and some common libraries are badly coded.

To get even more information run with

 valgrind --tool=memcheck --leak-check=full --log-file=log.txt --show-reachable=yes --num-callers=50 ./project1

How to run massif:

valgrind --tool=massif --run-libc-freeres=no  ./project1

On Mac OS X

Most of what is said above for linux also applies on Mac OS X. gprof is however not supported by FPC on Darwin/i386 and on Darwin/ARM at this time.

Shark

Shark is a sampling-based profiler. It supports time profiles, profiles based on performance counters (such as L2 cache misses), application-specific profiles and whole-system profiles. By default, it's installed at /Developer/Applications/Performance Tools/Shark.app

If you compile the profiled code with debug information (either Stabs or DWARF), Shark will display the (Pascal) source code in addition to the assembler code.

Shark only works for i386 and PowerPC. For 64 bit platforms, you have to use Instruments.app instead, which offers similar functionality.

Valgrind

Valgrind has been ported to Mac OS X/i386. For details, check valgrind's homepage.

If you experience the following issue:

 valgrind: ./test: cannot execute binary file

Chances are that your valgrind does only supports 64bit while your FPC program is compiled for 32bit. To make your application compatible simply add the switch -Px86_64.

FPProfiler

FPProfiler has been ported to Mac OS X as a fink package. For details see here: FPProfiler

Simple profiling using ifdefs and timing routines

Another, simpler, way of profiling is simply adding ifdefs and time counting. For counting the time the simples way is just using Now() from SysUtils. Other possibilities are NowUTC() from lazutf8sysutils or EpikTimer.

For example, to profile the following procedure:

procedure TMyClass.DoSomething;
begin
  ReallyDoSomething;
end;

One would add the following extra code around it:

function DateTimeToMilliseconds(aDateTime: TDateTime): Int64;
var
  TimeStamp: TTimeStamp;
begin
  {Call DateTimeToTimeStamp to convert DateTime to TimeStamp:}
  TimeStamp:= DateTimeToTimeStamp (aDateTime);
  {Multiply and add to complete the conversion:}
  Result:= TimeStamp.Time;
end;
 
procedure TMyClass.DoSomething;
var
  {$IFDEF VerboseProfiler}
  lTimeStart: TDateTime;
  {$ENDIF}
  ...
begin
  {$IFDEF VerboseProfiler}
  lTimeStart := Now();
  {$ENDIF}
 
  ReallyDoSomething;
 
  {$IFDEF VerboseProfiler}
  DebugLn(Format('[TMyClass.DoSomething] Duration: %d ms', [DateTimeToMilliseconds(Now() - lTimeStart)]));
  {$ENDIF}
end;

Activate the define to get the debug info in the DebugLn target