Difference between revisions of "PowerPC64 Port"

From Lazarus wiki
Jump to navigationJump to search
m (Fixed some grammar issues)
m (Some updates to the content)
Line 9: Line 9:
 
* The compiler survives a <tt>make build</tt> :)
 
* The compiler survives a <tt>make build</tt> :)
 
* Failed test suite program count is around 20
 
* Failed test suite program count is around 20
* The IDE does not work yet, but many packages and their examples which are 64 bit big-endian safe already do work (OpenGL, threading, GTK, GTK2, ...)
+
* The IDE is nearly okay now (editing works, but some minor issues remain), but many packages and their examples which are 64 bit big-endian safe already do work (OpenGL, threading, GTK, GTK2, ...)
 +
* Debuginfo (Stabs) do not work correctly. This is probably due to stabs being for 32 bit computers only.
  
 
== Building ==
 
== Building ==
Line 177: Line 178:
 
This is some sort of work diary, showing the current problems under investigation (and the problems already solved) with the PowerPC64 compiler.
 
This is some sort of work diary, showing the current problems under investigation (and the problems already solved) with the PowerPC64 compiler.
  
* the IDE compiles, after manually compiling and linking it, but nothing but the clock (yay!) in the upper right corner works.
 
 
* Stabs debug information is generated wrongly. Maybe related to stabs not working at all in 2.1.x at the moment.
 
* Stabs debug information is generated wrongly. Maybe related to stabs not working at all in 2.1.x at the moment.
 +
* <strike>the IDE compiles, after manually compiling and linking it, but nothing but the clock (yay!) in the upper right corner works.</strike>
 
* <strike>linking to the C library does not work due to path problems in the linker script on the bi-arch system I use.</strike>
 
* <strike>linking to the C library does not work due to path problems in the linker script on the bi-arch system I use.</strike>
 
* <strike>parameter passing is not completely compatible to C (and/or the ABI spec), which makes linking to C libraries non-working at the moment</strike>
 
* <strike>parameter passing is not completely compatible to C (and/or the ABI spec), which makes linking to C libraries non-working at the moment</strike>
Line 209: Line 210:
 
* Align your data if possible. Unaligned access (especially for 64 bit loads and stores on addresses not divisble by four) performs poorly. This basically means, do not use the <tt>packed</tt> modifier for records unless required.
 
* Align your data if possible. Unaligned access (especially for 64 bit loads and stores on addresses not divisble by four) performs poorly. This basically means, do not use the <tt>packed</tt> modifier for records unless required.
 
* Use the appropriate data type, preferring 32 bit integers over 64 bit integers due to reduced memory bandwidth requirements. Using a data type less than 32 bits does not give any speed advantage (except maybe if you are bandwidth constrained) because there are only 32 and 64 bit variants of the  "slow" integer arithmetic instructions (div, mul). (*)
 
* Use the appropriate data type, preferring 32 bit integers over 64 bit integers due to reduced memory bandwidth requirements. Using a data type less than 32 bits does not give any speed advantage (except maybe if you are bandwidth constrained) because there are only 32 and 64 bit variants of the  "slow" integer arithmetic instructions (div, mul). (*)
* The compiler automatically replaces integer divisions and integer modulo operation with a constant by faster multiplication. Additionally it does proper replacement by shifts if possible, so you do not need to care too much about it. (*)
+
* The compiler automatically replaces integer divisions and integer modulo operation with a constant by faster multiplication. Additionally it does proper replacement by shifts if possible, so you do not need to care too much about it. This can be activated using the <tt>-O2</tt> compiler switch.
  
 
(*) Although these optimizations are not implemented yet for the PowerPC64 platform, they surely will be =). Maybe these optimizations will require setting some optimization switch though, or will be disabled for some settings.
 
(*) Although these optimizations are not implemented yet for the PowerPC64 platform, they surely will be =). Maybe these optimizations will require setting some optimization switch though, or will be disabled for some settings.

Revision as of 15:27, 4 November 2005

The PowerPC64 port is in some sort of early beta-stage. It started as entry for the Linux On Power contest.

This page contains a rough outline of what is working and especially what not, some notes about the port and several other related information. More, detailed information (about calling conventions and so on) later.


Status

  • 2.1.x compiler support
  • Supports POWER4 and derivative (G5) for Linux at the moment.
  • The compiler survives a make build :)
  • Failed test suite program count is around 20
  • The IDE is nearly okay now (editing works, but some minor issues remain), but many packages and their examples which are 64 bit big-endian safe already do work (OpenGL, threading, GTK, GTK2, ...)
  • Debuginfo (Stabs) do not work correctly. This is probably due to stabs being for 32 bit computers only.

Building

Although the current compiler is still beta, and still has a few bugs, it can already be used by interested people. To get a working compiler, it is easiest to start on a G5 using the 32 bit PowerPC compiler. Only the ppcppc binary is needed. However, starting with a crosscompiled 64 bit binary is as good, but requires more skill and is not described in more detail here.

For those who are still interested, do the following:

Change into the directory where you have stored the current SVN sources from the current development branch, and enter the following command:

 make build PPC_TARGET=powerpc64 PP=<path_to_ppcppc>/ppcppc

For the ones which want a much faster binary, add OPT=-O2r to the command line. After a while (and lots of compilation) it should leave you with an empty commandline, and you are ready for installing the compiler. Assuming that you compiled as non-root, do the following:

 su   (You will be asked for your root password)
 make install
 exit (To give up root access again)

Compiling a Pascal program should be as easy as typing

 fpc <pascal_sourcefile>

after this procedure completed successfully.

Register usage

The table below shows how the compiler uses the registers for code generation and their meaning. The „volatile“ column indicates whether this register (or register set) is volatile or not, i.e. if a register is marked as volatile, the value is not preserved across function calls, otherwise they are automatically saved and restored by the function prolog and epilog. A value of „N/A“ means that the register is handled in a special way, and in general should not be modified by the programmer. This table basically contains a summary of the information in chapter 3.2.1 of the ABI specification.

Register Volatile Description
r0 Y Volatile register used in function prologs and constant loading (i.e. scratch register)
r1 N/A Stack frame pointer. The stack must always be quadword (16 byte) aligned
r2 N/A TOC pointer. Unused by FreePascal at the moment
r3 Y Parameter register and return value register
r4-r10 Y Registers used for function parameters
r11 Y Register used in calls by pointer, otherwise used as scratch register
r12 Y Register used for glink code and scratch register
r13 N/A Reserved for use as system thread ID. Never touched by the FreePascal compiler
r14-r31 N Registers used for local variables



f0 Y Scratch register
f1 Y Floating point parameter and return value register
f2-f13 Y Floating point parameter registers
f14-f31 N Registers used for local variables



LR Y Link register
CTR Y Loop counter register. FreePascal uses it for special code patterns, but not in common code
XER Y Fixed point exception register
FPSCR Y Floating point status and control register



CR0-CR3 Y Condition code register fields
CR2-CR4 N Nonvolatile condition code register fields
CR5-CR7 Y Volatile condition code register fields

Notes

  • The condition register is never saved to the stack in a function prolog, because FreePascal and its RTL only uses the volatile parts of this register.
  • There is no support for VMX (AltiVec) extensions in FreePascal, so no register usage is given.
  • r0, r2, r11 and r12 may be destroyed during a function call, i.e. you cannot pass a value to the callee using these registers.
  • The stack is always aligned to 16 bytes as per ABI convention.

Issues

This is some sort of work diary, showing the current problems under investigation (and the problems already solved) with the PowerPC64 compiler.

  • Stabs debug information is generated wrongly. Maybe related to stabs not working at all in 2.1.x at the moment.
  • the IDE compiles, after manually compiling and linking it, but nothing but the clock (yay!) in the upper right corner works.
  • linking to the C library does not work due to path problems in the linker script on the bi-arch system I use.
  • parameter passing is not completely compatible to C (and/or the ABI spec), which makes linking to C libraries non-working at the moment
  • OO Exceptions do not work, SIGSEGV'ing after several Push/PopExceptAddr invocations
  • there are some problems with packages/FCL due to missing workaround for non-aligned accesses in some records. A workaround for this (instruction) limitation is being worked on.
  • conditional jump offsets must by < 32k (same issue that has been fixed in rev. 1161 for PowerPC32)
  • -Or is currently broken
  • Exception handling does not seem to work, there is an issue with the sigaction() call (signal handlers fail to register).
  • Exception handler crashes at invoking HandleErrorAddrFrame(), possibly incorrect internal structures
  • object messages do not work. This is an alignment problem of generated internal structures.

Of course, in general, the compiler needs more testing. Another problem is the performance part: it is not too good yet. Programs compiled with the ppc32 compiler are usually faster, and the same program on a slower clocked x86_32 dances circles around both.

All of this (except maybe the performance part :-) will be fixed for an "official" release.

32 bit compatibility

During porting several compatibility problems of 32 bit PowerPC programs with the 32 bit emulation layer of Linux were found. They are:

  • the default cache line size is 128 byte instead of 32 byte. This affects some RTL routines which assume 32 byte cache line length, i.e. the assembly fillchar() and move() methods. For this reason, most (except very trivial) FPC compiled programs immediately segfault on PowerPC64. Fixing this involves selecting cache line aware fillchar() and move() methods at program startup.
  • Signal handlers for exceptions are not registered properly by the RTL at program startup. This is due to Linux/32 on powerpc64 using different syscall numbers than Linux/32 on PowerPC32. In particular it uses the rt_sig* constants, not the old ones. This also affects OO-exception handling, which does not work at all. (Note from oliebol: "Either ppc64 improving or fpc linux/x86 switching to rt_sig*").

There may (and I think there are) more issues. These are only those which I could reproducably pinpoint at this time.

Optimizing for PowerPC64

There are a few rules of thumb to make PowerPC64 programs perform well. This section will present at least some of them. Of course, these are rather low-level, often a better algorithm is more advantageous than to do this sort of bean-counting.

  • Use at least the -Or (use register variables) compiler switch to minimize stores and loads, as a side effect this also results in smaller executables. PowerPC64 is a RISC platform, and as such, does not have the complex instructions which can take memory operands. For example, using -Or more than halves the time required for cycling the compiler.
  • Align your data if possible. Unaligned access (especially for 64 bit loads and stores on addresses not divisble by four) performs poorly. This basically means, do not use the packed modifier for records unless required.
  • Use the appropriate data type, preferring 32 bit integers over 64 bit integers due to reduced memory bandwidth requirements. Using a data type less than 32 bits does not give any speed advantage (except maybe if you are bandwidth constrained) because there are only 32 and 64 bit variants of the "slow" integer arithmetic instructions (div, mul). (*)
  • The compiler automatically replaces integer divisions and integer modulo operation with a constant by faster multiplication. Additionally it does proper replacement by shifts if possible, so you do not need to care too much about it. This can be activated using the -O2 compiler switch.

(*) Although these optimizations are not implemented yet for the PowerPC64 platform, they surely will be =). Maybe these optimizations will require setting some optimization switch though, or will be disabled for some settings.

More information

This section contains links and notes to further information about the PowerPC64 architecture. This includes links to the ABI specification, general processor descriptions, instruction set documentation and general compiler notes.

  • Ian Lance Taylor, 64-bit PowerPC ELF Application Binary Interface Supplement 1.7, Zenbu Labs IBM, 2004 (HTML) (PDF) - This was the original ABI specification the PowerPC64 compiler was developed with.
  • Ian Lance Taylor, 64-bit PowerPC ELF Application Binary Interface Supplement 1.9, Zenbu Labs IBM, 2004 (HTML) (No PDF) - An update to the previous ABI specification. However, it only seems to add some clarifications compared to 1.7.
  • PowerPC Architecture Book - This website contains links to the PowerPC architecture manuals, i.e. a complete description of the instruction sets, containing both privileged and unprivileged instructions. The next links are direct references to the three books: Book I (PDF), Book I I (PDF), Book III (PDF)
  • PowerPC Compiler Writer's Guide - This website contains useful information for writing compilers for the PowerPC architecture, i.e. optimized code examples for common patterns. This manual has been written for the PowerPC32 architecture, but the techniques presented there can be easily ported over to the PowerPC64 platform.

Contact

Message to the fpc-devel mailing list, or look for "tom_at_work" in the IRC channel (or of course discuss it here).