Difference between revisions of "Optimization"
(Created page with headings Target Processor, Target FPU, Optimization Switches, and Notes) |
m (Warn about deadstore optimization) |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 17: | Line 17: | ||
− | If your 32-bit program uses a lot of single/double type variables, you may gain a significant speed boost from enabling SSE instructions. "-CfSSE" will speed single operations, and "-CfSSE2" will single and double operations. On the amd64 platform both are probably already on by default. | + | If your 32-bit program uses a lot of single/double type variables, you may gain a significant speed boost from enabling SSE instructions. "-CfSSE" will speed single operations, and "-CfSSE2" will speed single and double operations. On the amd64 platform both are probably already on by default. |
To check which FPU instruction sets your compiler version supports, run "fpc -if". | To check which FPU instruction sets your compiler version supports, run "fpc -if". | ||
Line 25: | Line 25: | ||
− | + | You can enable general optimization groups using "-O1", "-O2", "-O3", and "-O4". Individual switches can be enabled with "-Oo<switch>". | |
− | As of | + | As of 13 July 2020, the optimization groups (defined in '''/compiler/<arch>/cpuinfo.pas''', which references '''/compiler/globtype.pas''') are: |
-O1: PEEPHOLE | -O1: PEEPHOLE | ||
− | -O2: O1 + REMOVEEMPTYPROCS + REGVAR + STACKFRAME + TAILREC + CSE | + | -O2: O1 + REMOVEEMPTYPROCS + UNUSEDPARA + REGVAR + STACKFRAME + TAILREC + CSE |
− | -O3: O2 + CONSTPROP + DFA | + | -O3: O2 + CONSTPROP + DFA + USELOADMODIFYSTORE + LOOPUNROLL |
-O4: O3 + ORDERFIELDS + DEADVALUES + FASTMATH + USEEBP/USERBP | -O4: O3 + ORDERFIELDS + DEADVALUES + FASTMATH + USEEBP/USERBP | ||
− | Ungrouped: UNCERTAIN, SIZE | + | Ungrouped: UNCERTAIN, SIZE, STRENGTH, SCHEDULE, AUTOINLINE, DEADSTORE, FORCENOSTACKFRAME |
− | + | '''WARNING:''' In FPC 3.2.0, [https://bugs.freepascal.org/view.php?id=37305 the -O3 and -O4 options produce compilation errors on Windows] in some cases. Furthermore, the ungrouped and -O4 group have potential side effects and may break your code, so use responsibly. | |
+ | |||
+ | '''WARNING:''' In FPC 3.2.0 and 3.2.2, [https://bugs.freepascal.org/view.php?id=38698 dead store optimization is known to produce bad code] in some cases. Avoid using it for now. | ||
To check which switches your compiler version supports, run "fpc -io". | To check which switches your compiler version supports, run "fpc -io". | ||
+ | |||
+ | |||
+ | == Optimization in Code == | ||
+ | |||
+ | |||
+ | You can control what optimization happens in your code, eg {$optimization noloopunroll}. The following are defined in compiler/globtypes.pas | ||
+ | |||
+ | LEVEL1, LEVEL2, LEVEL3, LEVEL4, REGVAR, UNCERTAIN, SIZE, STACKFRAME, PEEPHOLE, LOOPUNROLL, TAILREC, CSE, DFA, STRENGTH, SCHEDULE, AUTOINLINE, USEEBP, USERBP, ORDERFIELDS, FASTMATH, DEADVALUES, REMOVEEMPTYPROCS, CONSTPROP, DEADSTORE, FORCENOSTACKFRAME, USELOADMODIFYSTORE, UNUSEDPARA | ||
+ | |||
+ | Putting 'NO' in front of any of those options has the opposite effect. | ||
Line 50: | Line 62: | ||
If you are interested in optimizing for size rather than speed, see [[Size Matters]]. | If you are interested in optimizing for size rather than speed, see [[Size Matters]]. | ||
+ | |||
+ | [[Vectorization]] is a powerful feature, but still a work in progress. You can already use SIMD instructions in inline assembly. | ||
+ | |||
+ | |||
+ | [[Category:FPC]] |
Latest revision as of 02:39, 5 April 2021
For an overview of optimization possibilities, see Chapter 11 of the Free Pascal Programmer's Guide.
Free Pascal allows you to use a set of straightforward compiler directives or commandline arguments to considerably power up your programs. (While running debug builds, you may want to keep most code optimizations off to enjoy faster compilation speeds and avoid rare unintended side effects.)
Target Processor
By default, FPC selects a conservative minimum target for code generation, to maximize compatibility. A higher target processor enables the compiler to use different instructions that would not be available on lower processors. How high you want to set the minimum target processor depends on your target audience. To set this, use "-Cp<CPU>". For example, on the x86 platform, -CpPENTIUMM gives the compiler plenty of room and still covers almost all users.
The compiler can also produce code generally favoring a particular processor, but without requiring that processor as a hard minimum. To set this, use "-Op<CPU>".
To check which processors your compiler version supports, run "fpc -ic".
Target FPU
If your 32-bit program uses a lot of single/double type variables, you may gain a significant speed boost from enabling SSE instructions. "-CfSSE" will speed single operations, and "-CfSSE2" will speed single and double operations. On the amd64 platform both are probably already on by default.
To check which FPU instruction sets your compiler version supports, run "fpc -if".
Optimization Switches
You can enable general optimization groups using "-O1", "-O2", "-O3", and "-O4". Individual switches can be enabled with "-Oo<switch>".
As of 13 July 2020, the optimization groups (defined in /compiler/<arch>/cpuinfo.pas, which references /compiler/globtype.pas) are:
-O1: PEEPHOLE
-O2: O1 + REMOVEEMPTYPROCS + UNUSEDPARA + REGVAR + STACKFRAME + TAILREC + CSE
-O3: O2 + CONSTPROP + DFA + USELOADMODIFYSTORE + LOOPUNROLL
-O4: O3 + ORDERFIELDS + DEADVALUES + FASTMATH + USEEBP/USERBP
Ungrouped: UNCERTAIN, SIZE, STRENGTH, SCHEDULE, AUTOINLINE, DEADSTORE, FORCENOSTACKFRAME
WARNING: In FPC 3.2.0, the -O3 and -O4 options produce compilation errors on Windows in some cases. Furthermore, the ungrouped and -O4 group have potential side effects and may break your code, so use responsibly.
WARNING: In FPC 3.2.0 and 3.2.2, dead store optimization is known to produce bad code in some cases. Avoid using it for now.
To check which switches your compiler version supports, run "fpc -io".
Optimization in Code
You can control what optimization happens in your code, eg {$optimization noloopunroll}. The following are defined in compiler/globtypes.pas
LEVEL1, LEVEL2, LEVEL3, LEVEL4, REGVAR, UNCERTAIN, SIZE, STACKFRAME, PEEPHOLE, LOOPUNROLL, TAILREC, CSE, DFA, STRENGTH, SCHEDULE, AUTOINLINE, USEEBP, USERBP, ORDERFIELDS, FASTMATH, DEADVALUES, REMOVEEMPTYPROCS, CONSTPROP, DEADSTORE, FORCENOSTACKFRAME, USELOADMODIFYSTORE, UNUSEDPARA
Putting 'NO' in front of any of those options has the opposite effect.
Notes
To take optimization a step further, see Whole Program Optimization.
If you are interested in optimizing for size rather than speed, see Size Matters.
Vectorization is a powerful feature, but still a work in progress. You can already use SIMD instructions in inline assembly.