Optimization

From Lazarus wiki
Revision as of 23:03, 13 July 2020 by Kirinn (talk | contribs) (Add mention of vectorisation)
Jump to navigationJump to search

For an overview of optimization possibilities, see Chapter 11 of the Free Pascal Programmer's Guide.

Free Pascal allows you to use a set of straightforward compiler directives or commandline arguments to considerably power up your programs. (While running debug builds, you may want to keep most code optimizations off to enjoy faster compilation speeds and avoid rare unintended side effects.)


Target Processor

By default, FPC selects a conservative minimum target for code generation, to maximize compatibility. A higher target processor enables the compiler to use different instructions that would not be available on lower processors. How high you want to set the minimum target processor depends on your target audience. To set this, use "-Cp<CPU>". For example, on the x86 platform, -CpPENTIUMM gives the compiler plenty of room and still covers almost all users.

The compiler can also produce code generally favoring a particular processor, but without requiring that processor as a hard minimum. To set this, use "-Op<CPU>".

To check which processors your compiler version supports, run "fpc -ic".


Target FPU

If your 32-bit program uses a lot of single/double type variables, you may gain a significant speed boost from enabling SSE instructions. "-CfSSE" will speed single operations, and "-CfSSE2" will speed single and double operations. On the amd64 platform both are probably already on by default.

To check which FPU instruction sets your compiler version supports, run "fpc -if".


Optimization Switches

You can enable general optimization groups using "-O1", "-O2", "-O3", and "-O4". Individual switches can be enabled with "-Oo<switch>".

As of 13 July 2020, the optimization groups (defined in /compiler/<arch>/cpuinfo.pas, which references /compiler/globtype.pas) are:

-O1: PEEPHOLE

-O2: O1 + REMOVEEMPTYPROCS + UNUSEDPARA + REGVAR + STACKFRAME + TAILREC + CSE

-O3: O2 + CONSTPROP + DFA + USELOADMODIFYSTORE + LOOPUNROLL

-O4: O3 + ORDERFIELDS + DEADVALUES + FASTMATH + USEEBP/USERBP

Ungrouped: UNCERTAIN, SIZE, STRENGTH, SCHEDULE, AUTOINLINE, DEADSTORE, FORCENOSTACKFRAME

WARNING: In FPC 3.2.0, the -O3 and -O4 options produce compilation errors on Windows in some cases. Furthermore, the ungrouped and -O4 group have potential side effects and may break your code, so use responsibly.

To check which switches your compiler version supports, run "fpc -io".

Optimization in Code

You can control what optimization happens in your code, eg {$optimization noloopunroll}. The following are defined in compiler/globtypes.pas

LEVEL1, LEVEL2, LEVEL3, LEVEL4, REGVAR, UNCERTAIN, SIZE, STACKFRAME, PEEPHOLE, LOOPUNROLL, TAILREC, CSE, DFA, STRENGTH, SCHEDULE, AUTOINLINE, USEEBP, USERBP, ORDERFIELDS, FASTMATH, DEADVALUES, REMOVEEMPTYPROCS, CONSTPROP, DEADSTORE, FORCENOSTACKFRAME, USELOADMODIFYSTORE, UNUSEDPARA

Putting 'NO' in front of any of those options has the opposite effect.

Notes

To take optimization a step further, see Whole Program Optimization.

If you are interested in optimizing for size rather than speed, see Size Matters.

Vectorization is a powerful feature, but still a work in progress. You can already use SIMD instructions in inline assembly.