Difference between revisions of "Whole Program Optimization"

From Lazarus wiki
Jump to navigationJump to search
 
(23 intermediate revisions by 7 users not shown)
Line 1: Line 1:
=Overview=
+
{{Whole_Program_Optimization}}
  
Traditionally, compilers optimise a program procedure by procedure, or at best compilation unit per compilation unit. Whole program optimisation (''wpo'') means that the compiler considers all compilation units that make up a program or library and optimises them using the combined knowledge of how they are used together in this particular case.
+
== Overview ==
 +
Traditionally, compilers optimize a program procedure by procedure, or at best compilation unit per compilation unit.
 +
Whole program optimization, abbreviated WPO, means that the compiler considers all compilation units that make up a program or library and optimizes them using the comprehensive knowledge of how they are used together in this particular case.
  
The way ''wpo'' generally works is as follows:
+
The way WPO generally works is as follows:
* you compile the program normally, telling the compiler to store various bits of information into a feedback file
+
* you compile the program normally, telling the compiler to store various bits of information into a ''feedback file''
* you recompile the program (and optionally all units that it uses) with ''wpo'', providing the feedback file as extra input to the compiler
+
* you ''recompile'' the program (and optionally all units that it uses) with WPO, providing the feedback file as extra input to the compiler
  
In some implementations, the compiler generates some kind of intermediary code (e.g., byte code) and the linker performs all ''wpo'' along with the translation to the target ISA. In case of FPC however, the scheme followed is the one described above.
+
In some implementations, the compiler generates some kind of intermediary code (e.g., byte code) and the linker performs all WPO along with the translation to the target ISA.
 +
In case of FPC however, the scheme followed is the one described above.
  
=General principles=
+
== General principles ==
 +
A few general principles have been followed when designing the FPC implementation of WPO:
 +
* All information necessary to generate a WPO feedback file for a program is always stored in the PPU files. This means that you can e.g. use a generic RTL for WPO (even though the RTL itself will then not be optimized, your program and its units can be correctly optimized because the compiler knows everything it has to know about all RTL units);
 +
* The generated WPO feedback file is plain text. The idea is that it should be easy to inspect this file by hand, and to add information to it produced by external tools if desired (e.g., profile information);
 +
* The implementation of the WPO subsystem in the compiler is very modular, so it should be easy to plug in additional WPO information providers, or to choose at run time between different information providers for the same kind of information. At the same time, the interaction with the rest of the compiler is kept to a bare minimum to improve maintainability;
 +
* It is possible to generate a WPO feedback file while at the same time using another one as input. In some cases, using this second feedback file as input during a third compilation can further improve the results.
  
A few general principles have been followed when designing the FPC implementation of ''wpo'':
+
== How to use ==
* All information necessary to generate a ''wpo'' feedback file for a program is always stored in the ppu files. This means that you can e.g. use a generic RTL for ''wpo'' (even though the RTL itself will then not be optimised, your program and its units can be correctly optimised because the compiler knows everything it has to know about all RTL units);
+
=== Generate WPO feedback file ===
* The generated ''wpo'' feedback file is plain text. The idea is that it should be easy to inspect this file by hand, and to add information to it produced by external tools if desired (e.g., profile information);
+
First of all, compile your program (or library) and all of its units as you would normally do, except that when compiling the main program/library you add <syntaxhighlight lang="bash">-FW/path/to/feedbackfile.wpo -OW<selected_wpo_options></syntaxhighlight>.
* The implementation of the ''wpo'' subsystem in the compiler is very modular, so it should be easy to plug in additional ''wpo'' information providers, or to choose at run time between different information providers for the same kind of information. At the same time, the interaction with the rest of the compiler is kept to a bare minimum to improve maintainability;
+
The compiler will then, right after your program has been linked, collect all necessary information to perform the requested WPOs during a successive compilation run, and store this information in <syntaxhighlight lang="bash" inline>/path/to/feedbackfile.wpo</syntaxhighlight>.
* It is possible to generate a ''wpo'' feedback file while at the same time using another one as input. In some cases, using this second feedback file as input during a third compilation can further improve the results.
 
  
=How to use=
+
=== Use generated WPO feedback file ===
 +
To actually apply the WPOs, recompile the program/library and all or some of the units that it uses, using <syntaxhighlight lang="bash">-Fw/path/to/feedbackfile.wpo -Ow<selected_wpo_options></syntaxhighlight>, thereby pointing the compiler to the feedback file generated in the previous step.
 +
The compiler will then read the information collected about the program during the previous compiler run, and use it during the current compilation of units and/or program/library.
  
==Generate WPO feedback file==
+
Units not recompiled during the second pass will obviously not be optimized, but they will still work correctly when used together with the optimized units and program/library.
  
First of all, compile your program (or library) and all of its units as you would normally do, except that when compiling the main program/library you add <tt>-FW/path/to/feedbackfile.wpo -OW<selected_wpo_options></tt>. The compiler will then, right after your program has been linked, collect all necessary information to perform the requested ''wpo''s during a successive compilation run, and store this information in <tt>/path/to/feedbackfile.wpo</tt>
+
=== Concrete example ===
 +
The example below refers to [https://gitlab.com/freepascal.org/fpc/source/-/blob/main/tests/test/opt/twpo4.pp <tt>twpo4.pp</tt>].
  
==Use generated WPO feedback file==
+
Compile the program a first time, collecting feedback in <syntaxhighlight lang="bash" inline>twpo4-1.wpo</syntaxhighlight> in the current directory.
 +
The compiler will record which classes are created in the program (<tt>tchild1</tt> and <tt>tchild2</tt>), and after the linking step also that the <syntaxhighlight lang="pascal" inline>notcalled</syntaxhighlight> procedure is in fact not called.
  
To actually apply the ''wpo''s, recompile the program/library and all or some of the units that it uses, using <tt>-Fw/path/to/feedbackfile.wpo -Ow<selected_wpo_options></tt>, thereby pointing the compiler to the feedback file generated in the previous step. The compiler will then read the information collected about the program during the previous compiler run, and use it during the current compilation of units and/or program/library.
+
Afterwards, save the generated assembler code in <syntaxhighlight lang="bash" inline>twpo4.s1</syntaxhighlight> for later comparison.
 
 
Units not recompiled during the second pass will obviously not be optimised, but they will still work correctly when used together with the optimised units and program/library.
 
 
 
==Concrete example==
 
 
 
(the twpo4.pp program referenced below can be found at http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/tests/test/opt/twpo4.pp?revision=12341&pathrev=12341)
 
 
 
Compile the program a first time, collecting feedback in twpo4-1.wpo in the current directory. The compiler will record which classes are created in the program (''tchild1'' and ''tchild2''), and after the linking step also that the ''notcalled'' procedure is in fact not called.
 
 
 
Afterwards, save the generated assembler code in twpo4.s1 for later comparison.
 
  
 
<pre>
 
<pre>
Line 49: Line 50:
 
</pre>
 
</pre>
  
Now compile the program a second time, using the information in twpo4-1.wpo and collecting new information in twpo4-2.wpo. At this point, the compiler knows that no instance of the ''tbase'' class is created in the program, and therefore it replaces all regular entries in its virtual method table with references to FPC_ABSTRACT_ERROR. Virtual class methods cannot be eliminated this way, since they can also be called without constructing an instance of this type.
+
Now compile the program a second time, using the information in <syntaxhighlight lang="bash" inline>twpo4-1.wpo</syntaxhighlight> and collecting new information in <syntaxhighlight lang="bash" inline>twpo4-2.wpo</syntaxhighlight>.
 +
At this point, the compiler knows that no instance of the <syntaxhighlight lang="pascal" inline>tbase</syntaxhighlight> class is created in the program, and therefore it replaces all regular entries in its virtual method table with references to <syntaxhighlight lang="pascal" inline>FPC_ABSTRACTERROR</syntaxhighlight>.
 +
Virtual class methods cannot be eliminated this way, since they can also be called without constructing an instance of this type.
  
Because the compiler knows from the previously generated feedback file that the ''notcalled'' procedure is never called, it will also record that only an instance of ''tchild1'' is created, as the ''tchild2'' instance was only created inside this ''notcalled'' procedure. For optimisation purposes, it will however still consider that an instance of ''tchild2'' may be created, as it uses the information from the feedback file generated during the previous compilation.
+
Because the compiler knows from the previously generated feedback file that the <syntaxhighlight lang="pascal" inline>notcalled</syntaxhighlight> procedure is never called, it will also record that only an instance of <syntaxhighlight lang="pascal" inline>tchild1</syntaxhighlight> is created, as the <syntaxhighlight lang="pascal" inline>tchild2</syntaxhighlight> instance was only created inside this <syntaxhighlight lang="pascal" inline>notcalled</syntaxhighlight> procedure.
 +
For optimization purposes, it will however still consider that an instance of <syntaxhighlight lang="pascal" inline>tchild2</syntaxhighlight> may be created, as for optimization purposes it still uses the feedback file generated during the previous compilation.
  
 
Afterwards, we again save the generated assembler code for later comparisons.
 
Afterwards, we again save the generated assembler code for later comparisons.
Line 63: Line 67:
 
Assembling program
 
Assembling program
 
Linking twpo4
 
Linking twpo4
66 lines compiled, 0.1 sec  
+
66 lines compiled, 0.1 sec
 
$ mv twpo4.s twpo4.s2
 
$ mv twpo4.s twpo4.s2
 
</pre>
 
</pre>
  
Compile the program a final time, using the information collected in twpo4-2.wpo. This time, the compiler knows that only a ''tchild1'' instance is created, and will therefore turn the ''bb.test'' call from a virtual method call into a static method call (because it knows that even though the type of ''bb'' is ''tbase'', in practice it can only be a ''tchild1'' since no ''tbase'' or other ''tbase''-descendent class instance is created).
+
Compile the program a final time, using the information collected in <syntaxhighlight lang="bash" inline>twpo4-2.wpo</syntaxhighlight>.
 +
This time, the compiler knows that only a <syntaxhighlight lang="pascal" inline>tchild1</syntaxhighlight> instance is created, and will therefore turn the <syntaxhighlight lang="pascal" inline>bb.test</syntaxhighlight> call from a virtual method call into a static method call (because it knows that even though the type of <syntaxhighlight lang="pascal" inline>bb</syntaxhighlight> is <syntaxhighlight lang="pascal" inline>tbase</syntaxhighlight>, in practice it can only be a <syntaxhighlight lang="pascal" inline>tchild1</syntaxhighlight> since no <syntaxhighlight lang="pascal" inline>tbaseh</syntaxhighlight> or other <syntaxhighlight lang="pascal" inline>tbase</syntaxhighlight>-descendent class instance is created).
  
 
<pre>
 
<pre>
$ ppn37 -Fwtwpo4-2.wpo -Owall -CX -XX -Xs- -al twpo4
+
$ ppn37 -Fwtwpo4-2.wpo -Owall -CX -XX -al twpo4
 
Free Pascal Compiler version 2.3.1 [2008/12/11] for i386
 
Free Pascal Compiler version 2.3.1 [2008/12/11] for i386
 
Copyright (c) 1993-2008 by Florian Klaempfl
 
Copyright (c) 1993-2008 by Florian Klaempfl
Line 77: Line 82:
 
Assembling program
 
Assembling program
 
Linking twpo4
 
Linking twpo4
66 lines compiled, 0.1 sec  
+
66 lines compiled, 0.1 sec
 
</pre>
 
</pre>
  
 
Now, let's have a look at the differences between the generated assembler files. After the first recompilation:
 
Now, let's have a look at the differences between the generated assembler files. After the first recompilation:
  
<pre>
+
<syntaxhighlight lang="bash">
 
$ diff -u twpo4.s1 twpo4.s2
 
$ diff -u twpo4.s1 twpo4.s2
 +
</syntaxhighlight>
 +
 +
<syntaxhighlight lang="diff">
 
--- twpo4.s1 2008-12-11 18:59:55.000000000 +0100
 
--- twpo4.s1 2008-12-11 18:59:55.000000000 +0100
 
+++ twpo4.s2 2008-12-11 19:00:15.000000000 +0100
 
+++ twpo4.s2 2008-12-11 19:00:15.000000000 +0100
Line 108: Line 116:
 
+ .long FPC_ABSTRACTERROR
 
+ .long FPC_ABSTRACTERROR
 
  .long 0
 
  .long 0
+
 
 
  .const_data
 
  .const_data
</pre>
+
</syntaxhighlight>
  
As you can see, all references to the virtual methods in the virtual method table of ''tbase'' have been removed, since the compiler knows they can never be referenced. This would allow the linker to throw these methods away if they weren't referenced anywhere else either, but that's not the case here except for ''tbase.test''.
+
As you can see, all references to the virtual methods in the virtual method table of <syntaxhighlight lang="pascal" inline>tbase</syntaxhighlight> have been removed, since the compiler knows they can never be referenced.
 +
This allows the linker to throw away these methods if they are not referenced anywhere else either.
 +
In our example, that is only true for <syntaxhighlight lang="pascal" inline>tbase.test</syntaxhighlight>.
  
 
Now let's look at the effect of the second recompilation:
 
Now let's look at the effect of the second recompilation:
  
<pre>
+
<syntaxhighlight lang="bash">
 
$ diff -u twpo4.s2 twpo4.s
 
$ diff -u twpo4.s2 twpo4.s
 +
</syntaxhighlight>
 +
 +
<syntaxhighlight lang="diff">
 
--- twpo4.s2 2008-12-11 19:00:15.000000000 +0100
 
--- twpo4.s2 2008-12-11 19:00:15.000000000 +0100
 
+++ twpo4.s 2008-12-11 19:00:29.000000000 +0100
 
+++ twpo4.s 2008-12-11 19:00:29.000000000 +0100
Line 143: Line 156:
 
  movl L_U_P$PROGRAM_A$non_lazy_ptr-Lj2(%ebx),%eax
 
  movl L_U_P$PROGRAM_A$non_lazy_ptr-Lj2(%ebx),%eax
 
  movl $2,(%eax)
 
  movl $2,(%eax)
</pre>
+
</syntaxhighlight>
  
 
As you can see, the two virtual method calls to ''bb.test'' (one inside the never called ''notcalled'' procedure, and one in the main program) have been replaced with calls to ''tchild1.test''.
 
As you can see, the two virtual method calls to ''bb.test'' (one inside the never called ''notcalled'' procedure, and one in the main program) have been replaced with calls to ''tchild1.test''.
  
Note that in practice, the first recompilation will usually already give you most gains. Also note that you could also recompile the entire run time library for this particular program, if you want.
+
Note that you could also recompile the entire run time library for this particular program, if you want.
 +
 
 +
=== Concrete Lazarus IDE example ===
 +
 
 +
To setup Whole Program Optimization in Lazarus IDE, add two new Build Modes for each platform.  For example:
 +
 
 +
* "Linux WPO Pass1"
 +
* "Linux WPO Pass2"
 +
* "Windows WPO Pass1"
 +
* "Windows WPO Pass2"
  
=When to use=
+
[[File:01 Add New Build Modes.png]]
  
Since whole program optimisation requires multiple compilations, it is advisable to only use this functionality when compiling a final release version.
+
See: [[IDE Window: Compiler Options#Build modes]].
 +
 +
Switch to Build mode "Linux WPO Pass1".  Under "Compiler Options | Custom Options", add the Custom Options "-OWall -FWtemplinux.wpo -Xs- -CX" (case sensitive, should be UPPERCASE "W").
  
=Available whole program optimisations=
+
[[File:02 Linux WPO Pass1.png]]
  
== All optimisations==
+
Switch to Build mode "Linux WPO Pass2".  Under "Compiler Options | Custom Options", add the Custom Options "-Owall -Fwtemplinux.wpo -Xs- -CX" (case sensitive, should be lowercase "w").
===Parameter===
 
-OWall/-Owall
 
  
===Effect===
+
[[File:03 Linux WPO Pass2.png]]
Enables all whole program optimisations described below.
 
  
===Limitations===
+
Switch to Build mode "Windows WPO Pass1".  Under "Compiler Options | Custom Options", add the Custom Options "-OWDEVIRTCALLS,OPTVMTS -FWtempwindows.wpo -CX -XX -Xs-" (case sensitive, should be UPPERCASE "W").
Not applicable.
 
  
 +
[[File:04 Windows WPO Pass1.png]]
  
==Whole Program Devirtualisation==
+
Switch to Build mode "Windows WPO Pass2".  Under "Compiler Options | Custom Options", add the Custom Options "-OwDEVIRTCALLS,OPTVMTS -Fwtempwindows.wpo -CX -XX -Xs-" (case sensitive, should be lowercase "w").
  
===Parameter===
+
[[File:05 Windows WPO Pass2.png]]
-OWdevirtcalls/-Owdevirtcalls
 
  
===Effect===
+
When you are happy with your Project Options settings, you might want to click the Export button, to save them to an XML file, which you can import into other projects later.
Changes virtual method calls into normal (static) method calls when the compiler can determine that a virtual method call will always go to the same static method. This makes such code both smaller and faster. In general, it's mainly an enabling optimisation for other optimisations, because it makes the program easier to analyse due to the fact that reduces indirect control flow.
 
  
===Limitations===
+
When you are ready to compile your program, from the IDE menu, select "Run | Compile Many Modes".  Select both Pass1 and Pass2 to compile in exactly that proper order.
* The current implementation is context-insensitive. This means that the compiler only looks at the program as a whole and determines for each class type which methods can be devirtualised, rather than that it looks at each call statement and the surrounding code to determine whether or not this call can be devirtualised;
 
* The current implementation does not yet devirtualise interface method calls (not when calling them via an interface instance, nor when calling them via a class instance).
 
  
 +
[[File:06 Build Many Modes.png]]
  
==Optimise Virtual Method Tables==
+
If your Build Modes are out of sequence, you could simply compile each Pass separately, in proper order.  Alternatively, you could very carefully manually edit in a text editor your LPI project file (after first making a backup copy), to put the Passes into correct order.
  
===Parameter===
+
== When to use ==
-OWoptvmts/-Owoptvmts
 
  
===Effect===
+
Since whole program optimization requires multiple compilations, it is advisable to only use this functionality when compiling a final release version.
This optimisations looks at which class types can be instantiated in a program, and based on this information it replaces virtual method table (VMT) entries that can never be called with references to FPC_ABSTRACT error. This means that such methods, unless they are called directly via an ''inherited'' call from a child class/object, can be removed by the linker. It has little or no effect on speed, but can help reducing code size.
 
  
===Limitations===
+
Also keep in mind that once a unit has been compiled using ''wpo'' for a particular program, it has to be recompiled if you want to use it in another program.
* Such optimisations are not yet done for virtual class methods (but it should not be that difficult to add)
 
  
==Symbol liveness==
+
== Available whole program optimizations ==
 +
=== All optimizations ===
 +
; parameter
 +
: <syntaxhighlight lang="bash">
 +
-OWall/-Owall
 +
</syntaxhighlight>
 +
; effect
 +
: Enables all whole program optimizations described below.
 +
; limitations
 +
: The combined limitations of all optimizations described below.
 +
 
 +
=== Whole program devirtualization ===
 +
; parameter
 +
: <syntaxhighlight lang="bash">
 +
-OWdevirtcalls/-Owdevirtcalls
 +
</syntaxhighlight>
 +
; effect
 +
: Changes virtual method calls into normal (static) method calls when the compiler can determine that a virtual method call will always go to the same static method. This makes such code both smaller and faster. In general, it's mainly an enabling optimization for other optimizations, because it makes the program easier to analyse due to the fact that it reduces indirect control flow.
 +
; limitations
 +
* The current implementation is context-insensitive. This means that the compiler only looks at the program as a whole and determines for each class type which methods can be devirtualised, rather than that it looks at each call statement and the surrounding code to determine whether or not this call can be devirtualised;
 +
* The current implementation does not yet devirtualise interface method calls (not when calling them via an interface instance, nor when calling them via a class instance).
  
===Parameter===
 
-OWsymbolliveness/-Owsymbolliveness
 
  
===Effect===
+
=== Optimise virtual method tables ===
This parameter does not perform any optimisation by itself. It simply tells the compiler to record which functions/procedures were not removed by the linker in the final program. During a subsequent ''wpo'' pass, the compiler can then ignore the removed functions/procedures as far as ''wpo'' is concerned (e.g., if a particular class type is only constructed in one unused procedure, then ignoring this procedure can improve the effectiveness of the previous two optimisations).
+
; parameter
 +
: <syntaxhighlight lang="bash">
 +
-OWoptvmts/-Owoptvmts
 +
</syntaxhighlight>
 +
; effect
 +
: This optimization looks at which class types can be instantiated and which virtual methods can be called in a program, and based on this information it replaces virtual method table (VMT) entries that can never be called with references to FPC_ABSTRACTERROR. This means that such methods, unless they are called directly via an ''inherited'' call from a child class/object, can be removed by the linker. It has little or no effect on speed, but can help reducing code size.
 +
; limitations
 +
* Methods that are ''published'', or getters/setters of published properties, can never be optimized in this way, because they can always be referred to and called via the RTTI (which the compiler cannot detect).
 +
* Such optimizations are not yet done for virtual class methods (but it should not be that difficult to add)
  
===Limitations===
+
=== Symbol liveness ===
* This optimisation requires that the ''nm'' utility is installed on the system. For Linux binaries, ''objdump'' will also work. In the future, this information could also be extracted from the internal linkers for the platforms that it supports.
+
; parameter
* Collecting information for this optimisation (using -OWsymbolliveness) requires that smart linking is enabled  (-XX)  and that symbol stripping is disabled (-Xs-).
+
: <syntaxhighlight lang="bash">
 +
-OWsymbolliveness/-Owsymbolliveness
 +
</syntaxhighlight>
 +
; effect
 +
: This parameter does not perform any optimization by itself. It simply tells the compiler to record which functions/procedures were kept by the linker in the final program. During a subsequent ''wpo'' pass, the compiler can then ignore the removed functions/procedures as far as ''wpo'' is concerned (e.g., if a particular class type is only constructed in one unused procedure, then ignoring this procedure can improve the effectiveness of the previous two optimizations).
 +
; limitations
 +
* This optimization requires that the <tt>nm(1)</tt> utility is installed on the system. For Linux binaries, <tt>objdump(1)</tt> will also work. In the future, this information could also be extracted from the internal linker for the platforms that it supports.
 +
* Collecting information for this optimization (using -OWsymbolliveness) requires that smart linking is enabled  (-XX)  and that symbol stripping is disabled (-Xs-). When only using such previously collected information (using -OwSymbolliveness or -Owall), these limitations do not apply.
  
=Format of the wpo feedback file=
+
== Format of the WPO feedback file ==
 +
This information is mainly interesting if you want to add external data to the WPO feedback file, e.g. from a profiling tool. If you are just a user of the WPO functionality, you can ignore what follows.
  
This information is mainly interesting if you want to add external data to the ''wpo'' feedback file, e.g. from a profiling tool. If you are just a user of the ''wpo'' functionality, you can ignore what follows.
+
The file consists of comments and a number of sections.
 +
Comments are lines that start with a <syntaxhighlight lang="bash" inline>#</syntaxhighlight>.
 +
Each section starts with <syntaxhighlight lang="bash" inline>% </syntaxhighlight> (percent sign and space) followed by the name of the section (e.g.,<tt>% contextinsensitive_devirtualization</tt>).
 +
After that, until either the end of the file or until the next line starting with with "<tt>% </tt>", first a human readable description follows of the format of this section (in comments), and then the contents of the section itself.
  
The file consists of comments and a number of sections. Comments are lines that start with a <tt>#</tt>. Each section starts with "<tt>% </tt>" followed by the name of the section (e.g.,<tt>% contextinsensitive_devirtualization</tt>). After that, until either the end of the file or until the next line starting with with "<tt>% </tt>", first a human readable description follows of the format of this section (in comments), and then the contents of the section itself.
+
There are no rules for how the contents of a section should look, except that lines starting with <syntaxhighlight lang="bash" inline>#</syntaxhighlight> are reserved for comments and lines starting with <syntaxhighlight lang="bash" inline>% </syntaxhighlight> are reserved for section markers.
  
There are no rules for how the contents of a section should look, except that lines start with <tt>#</tt> are reserved for comments and lines starting with <tt>%  </tt> are reserved for section markers..
+
[[Category:FPC]]
 +
[[Category:Tutorials]]

Latest revision as of 16:17, 6 August 2022

English (en) français (fr)

Overview

Traditionally, compilers optimize a program procedure by procedure, or at best compilation unit per compilation unit. Whole program optimization, abbreviated WPO, means that the compiler considers all compilation units that make up a program or library and optimizes them using the comprehensive knowledge of how they are used together in this particular case.

The way WPO generally works is as follows:

  • you compile the program normally, telling the compiler to store various bits of information into a feedback file
  • you recompile the program (and optionally all units that it uses) with WPO, providing the feedback file as extra input to the compiler

In some implementations, the compiler generates some kind of intermediary code (e.g., byte code) and the linker performs all WPO along with the translation to the target ISA. In case of FPC however, the scheme followed is the one described above.

General principles

A few general principles have been followed when designing the FPC implementation of WPO:

  • All information necessary to generate a WPO feedback file for a program is always stored in the PPU files. This means that you can e.g. use a generic RTL for WPO (even though the RTL itself will then not be optimized, your program and its units can be correctly optimized because the compiler knows everything it has to know about all RTL units);
  • The generated WPO feedback file is plain text. The idea is that it should be easy to inspect this file by hand, and to add information to it produced by external tools if desired (e.g., profile information);
  • The implementation of the WPO subsystem in the compiler is very modular, so it should be easy to plug in additional WPO information providers, or to choose at run time between different information providers for the same kind of information. At the same time, the interaction with the rest of the compiler is kept to a bare minimum to improve maintainability;
  • It is possible to generate a WPO feedback file while at the same time using another one as input. In some cases, using this second feedback file as input during a third compilation can further improve the results.

How to use

Generate WPO feedback file

First of all, compile your program (or library) and all of its units as you would normally do, except that when compiling the main program/library you add

-FW/path/to/feedbackfile.wpo -OW<selected_wpo_options>

.

The compiler will then, right after your program has been linked, collect all necessary information to perform the requested WPOs during a successive compilation run, and store this information in /path/to/feedbackfile.wpo.

Use generated WPO feedback file

To actually apply the WPOs, recompile the program/library and all or some of the units that it uses, using

-Fw/path/to/feedbackfile.wpo -Ow<selected_wpo_options>

, thereby pointing the compiler to the feedback file generated in the previous step.

The compiler will then read the information collected about the program during the previous compiler run, and use it during the current compilation of units and/or program/library.

Units not recompiled during the second pass will obviously not be optimized, but they will still work correctly when used together with the optimized units and program/library.

Concrete example

The example below refers to twpo4.pp.

Compile the program a first time, collecting feedback in twpo4-1.wpo in the current directory. The compiler will record which classes are created in the program (tchild1 and tchild2), and after the linking step also that the notcalled procedure is in fact not called.

Afterwards, save the generated assembler code in twpo4.s1 for later comparison.

$ ppn37 -FWtwpo4-1.wpo -OWall -CX -XX -Xs- -al twpo4
Free Pascal Compiler version 2.3.1 [2008/12/11] for i386
Copyright (c) 1993-2008 by Florian Klaempfl
Target OS: Darwin for i386
Compiling twpo4.pp
Assembling program
Linking twpo4
66 lines compiled, 0.6 sec
$ mv twpo4.s twpo4.s1

Now compile the program a second time, using the information in twpo4-1.wpo and collecting new information in twpo4-2.wpo. At this point, the compiler knows that no instance of the tbase class is created in the program, and therefore it replaces all regular entries in its virtual method table with references to FPC_ABSTRACTERROR. Virtual class methods cannot be eliminated this way, since they can also be called without constructing an instance of this type.

Because the compiler knows from the previously generated feedback file that the notcalled procedure is never called, it will also record that only an instance of tchild1 is created, as the tchild2 instance was only created inside this notcalled procedure. For optimization purposes, it will however still consider that an instance of tchild2 may be created, as for optimization purposes it still uses the feedback file generated during the previous compilation.

Afterwards, we again save the generated assembler code for later comparisons.

$ ppn37 -FWtwpo4-2.wpo -OWall -Fwtwpo4-1.wpo -Owall -CX -XX -Xs- -al twpo4
Free Pascal Compiler version 2.3.1 [2008/12/11] for i386
Copyright (c) 1993-2008 by Florian Klaempfl
Target OS: Darwin for i386
Compiling twpo4.pp
Assembling program
Linking twpo4
66 lines compiled, 0.1 sec
$ mv twpo4.s twpo4.s2

Compile the program a final time, using the information collected in twpo4-2.wpo. This time, the compiler knows that only a tchild1 instance is created, and will therefore turn the bb.test call from a virtual method call into a static method call (because it knows that even though the type of bb is tbase, in practice it can only be a tchild1 since no tbaseh or other tbase-descendent class instance is created).

$ ppn37 -Fwtwpo4-2.wpo -Owall -CX -XX -al twpo4
Free Pascal Compiler version 2.3.1 [2008/12/11] for i386
Copyright (c) 1993-2008 by Florian Klaempfl
Target OS: Darwin for i386
Compiling twpo4.pp
Assembling program
Linking twpo4
66 lines compiled, 0.1 sec

Now, let's have a look at the differences between the generated assembler files. After the first recompilation:

$ diff -u twpo4.s1 twpo4.s2
--- twpo4.s1	2008-12-11 18:59:55.000000000 +0100
+++ twpo4.s2	2008-12-11 19:00:15.000000000 +0100
@@ -214,15 +214,15 @@
 	.long	0,0,0
 	.long	FPC_EMPTYINTF
 	.long	0
-	.long	_SYSTEM_TOBJECT_$__DESTROY
+	.long	FPC_ABSTRACTERROR
 	.long	_SYSTEM_TOBJECT_$__NEWINSTANCE$$TOBJECT
-	.long	_SYSTEM_TOBJECT_$__FREEINSTANCE
-	.long	_SYSTEM_TOBJECT_$__SAFECALLEXCEPTION$TOBJECT$POINTER$$LONGINT
-	.long	_SYSTEM_TOBJECT_$__DEFAULTHANDLER$formal
-	.long	_SYSTEM_TOBJECT_$__AFTERCONSTRUCTION
-	.long	_SYSTEM_TOBJECT_$__BEFOREDESTRUCTION
-	.long	_SYSTEM_TOBJECT_$__DEFAULTHANDLERSTR$formal
-	.long	_P$PROGRAM_TBASE_$__TEST
+	.long	FPC_ABSTRACTERROR
+	.long	FPC_ABSTRACTERROR
+	.long	FPC_ABSTRACTERROR
+	.long	FPC_ABSTRACTERROR
+	.long	FPC_ABSTRACTERROR
+	.long	FPC_ABSTRACTERROR
+	.long	FPC_ABSTRACTERROR
 	.long	0

 .const_data

As you can see, all references to the virtual methods in the virtual method table of tbase have been removed, since the compiler knows they can never be referenced. This allows the linker to throw away these methods if they are not referenced anywhere else either. In our example, that is only true for tbase.test.

Now let's look at the effect of the second recompilation:

$ diff -u twpo4.s2 twpo4.s
--- twpo4.s2	2008-12-11 19:00:15.000000000 +0100
+++ twpo4.s	2008-12-11 19:00:29.000000000 +0100
@@ -103,9 +103,7 @@
 	movl	%eax,-4(%ebp)
 # [54] bb.test;
 	movl	-4(%ebp),%eax
-	movl	-4(%ebp),%edx
-	movl	(%edx),%edx
-	call	*80(%edx)
+	call	L_P$PROGRAM_TCHILD1_$__TEST$stub
 # [55] bb.free;
 	movl	-4(%ebp),%eax
 	call	L_SYSTEM_TOBJECT_$__FREE$stub
@@ -139,10 +137,7 @@
 # [63] bb.test;
 	movl	L_U_P$PROGRAM_BB$non_lazy_ptr-Lj2(%ebx),%eax
 	movl	(%eax),%eax
-	movl	L_U_P$PROGRAM_BB$non_lazy_ptr-Lj2(%ebx),%edx
-	movl	(%edx),%edx
-	movl	(%edx),%edx
-	call	*80(%edx)
+	call	L_P$PROGRAM_TCHILD1_$__TEST$stub
 # [64] a:=2;
 	movl	L_U_P$PROGRAM_A$non_lazy_ptr-Lj2(%ebx),%eax
 	movl	$2,(%eax)

As you can see, the two virtual method calls to bb.test (one inside the never called notcalled procedure, and one in the main program) have been replaced with calls to tchild1.test.

Note that you could also recompile the entire run time library for this particular program, if you want.

Concrete Lazarus IDE example

To setup Whole Program Optimization in Lazarus IDE, add two new Build Modes for each platform. For example:

  • "Linux WPO Pass1"
  • "Linux WPO Pass2"
  • "Windows WPO Pass1"
  • "Windows WPO Pass2"

01 Add New Build Modes.png

See: IDE Window: Compiler Options#Build modes.

Switch to Build mode "Linux WPO Pass1". Under "Compiler Options | Custom Options", add the Custom Options "-OWall -FWtemplinux.wpo -Xs- -CX" (case sensitive, should be UPPERCASE "W").

02 Linux WPO Pass1.png

Switch to Build mode "Linux WPO Pass2". Under "Compiler Options | Custom Options", add the Custom Options "-Owall -Fwtemplinux.wpo -Xs- -CX" (case sensitive, should be lowercase "w").

03 Linux WPO Pass2.png

Switch to Build mode "Windows WPO Pass1". Under "Compiler Options | Custom Options", add the Custom Options "-OWDEVIRTCALLS,OPTVMTS -FWtempwindows.wpo -CX -XX -Xs-" (case sensitive, should be UPPERCASE "W").

04 Windows WPO Pass1.png

Switch to Build mode "Windows WPO Pass2". Under "Compiler Options | Custom Options", add the Custom Options "-OwDEVIRTCALLS,OPTVMTS -Fwtempwindows.wpo -CX -XX -Xs-" (case sensitive, should be lowercase "w").

05 Windows WPO Pass2.png

When you are happy with your Project Options settings, you might want to click the Export button, to save them to an XML file, which you can import into other projects later.

When you are ready to compile your program, from the IDE menu, select "Run | Compile Many Modes". Select both Pass1 and Pass2 to compile in exactly that proper order.

06 Build Many Modes.png

If your Build Modes are out of sequence, you could simply compile each Pass separately, in proper order. Alternatively, you could very carefully manually edit in a text editor your LPI project file (after first making a backup copy), to put the Passes into correct order.

When to use

Since whole program optimization requires multiple compilations, it is advisable to only use this functionality when compiling a final release version.

Also keep in mind that once a unit has been compiled using wpo for a particular program, it has to be recompiled if you want to use it in another program.

Available whole program optimizations

All optimizations

parameter
-OWall/-Owall
effect
Enables all whole program optimizations described below.
limitations
The combined limitations of all optimizations described below.

Whole program devirtualization

parameter
-OWdevirtcalls/-Owdevirtcalls
effect
Changes virtual method calls into normal (static) method calls when the compiler can determine that a virtual method call will always go to the same static method. This makes such code both smaller and faster. In general, it's mainly an enabling optimization for other optimizations, because it makes the program easier to analyse due to the fact that it reduces indirect control flow.
limitations
  • The current implementation is context-insensitive. This means that the compiler only looks at the program as a whole and determines for each class type which methods can be devirtualised, rather than that it looks at each call statement and the surrounding code to determine whether or not this call can be devirtualised;
  • The current implementation does not yet devirtualise interface method calls (not when calling them via an interface instance, nor when calling them via a class instance).


Optimise virtual method tables

parameter
-OWoptvmts/-Owoptvmts
effect
This optimization looks at which class types can be instantiated and which virtual methods can be called in a program, and based on this information it replaces virtual method table (VMT) entries that can never be called with references to FPC_ABSTRACTERROR. This means that such methods, unless they are called directly via an inherited call from a child class/object, can be removed by the linker. It has little or no effect on speed, but can help reducing code size.
limitations
  • Methods that are published, or getters/setters of published properties, can never be optimized in this way, because they can always be referred to and called via the RTTI (which the compiler cannot detect).
  • Such optimizations are not yet done for virtual class methods (but it should not be that difficult to add)

Symbol liveness

parameter
-OWsymbolliveness/-Owsymbolliveness
effect
This parameter does not perform any optimization by itself. It simply tells the compiler to record which functions/procedures were kept by the linker in the final program. During a subsequent wpo pass, the compiler can then ignore the removed functions/procedures as far as wpo is concerned (e.g., if a particular class type is only constructed in one unused procedure, then ignoring this procedure can improve the effectiveness of the previous two optimizations).
limitations
  • This optimization requires that the nm(1) utility is installed on the system. For Linux binaries, objdump(1) will also work. In the future, this information could also be extracted from the internal linker for the platforms that it supports.
  • Collecting information for this optimization (using -OWsymbolliveness) requires that smart linking is enabled (-XX) and that symbol stripping is disabled (-Xs-). When only using such previously collected information (using -OwSymbolliveness or -Owall), these limitations do not apply.

Format of the WPO feedback file

This information is mainly interesting if you want to add external data to the WPO feedback file, e.g. from a profiling tool. If you are just a user of the WPO functionality, you can ignore what follows.

The file consists of comments and a number of sections. Comments are lines that start with a #. Each section starts with % (percent sign and space) followed by the name of the section (e.g.,% contextinsensitive_devirtualization). After that, until either the end of the file or until the next line starting with with "% ", first a human readable description follows of the format of this section (in comments), and then the contents of the section itself.

There are no rules for how the contents of a section should look, except that lines starting with # are reserved for comments and lines starting with % are reserved for section markers.