Writing portable code regarding the processor architecture

From Lazarus wiki
Jump to navigationJump to search

There are several main issues when writing code which is portable regarding the processor architecture: endianness and 32 vs. 64 Bit processors.

Endianness

Endianness is the way how values larger than a byte (e.g. 16/32/64-bit integers) are stored by the processor.

Generally there are two ways:

  1. Store the lowest value on the lowest address; longint(4) encoded as 04 00 00 00
  2. Store the highest value on the lowest address; longint(4) encoded as 00 00 00 04

The first way is called little endian, and the second way big endian.

This is generally a given choice per processor family, but some families of processors can be either big endian or little endian depending on the mainboard they are attached to (ARM, PPC).

The best known little endian processor family is x86, the processor family used in PCs, and its brethren x86-64. Typical big endian processors are PPC (usually, see above note), and Motorola's m68k.

Since TCP/IP specifies that all protocol header structures that go over the wire should be big endian, so this notation is sometimes also refered to as network order.

Endianness is important

  1. when exchanging data between different architectures
  2. when accessing data sometimes as (an array of) a larger type, like integer, and sometimes as (an array of) a byte.

An example of the latter:


var x : ^longint;

begin
   new(x);
   x^:=5;
   writeln(chr(ord(pchar(x)^)+48));  // writes 5 or 0 depending on endianness
end.

On little endian machines (PCs), the above code will write 5 (since longint(5) is stored as 05 00 00 00 in memory), while on big endian machines (e.g. Powermacs) it will write 0 (since longint(5) is stored as 00 00 00 05 in memory).

To determine the endianness of the processor, use the ENDIAN_BIG or ENDIAN_LITTLE (or FPC_LITTLE_ENDIAN and FPC_BIG_ENDIAN starting from version 1.9) defines that are defined by freepascal automatically depending on the processor.

Alignment

Some processors generate hardware processor exceptions when data is badly aligned (e.g. Alpha or ARM). Sometimes the hardware exceptions are caught and fixed using emulation by the OS, but this is very slow, and should be avoided. This can also cause records to have different sizes, so always use sizeof(recordtype); as size of a record. If you define a packed record, try to ensure that data is naturally aligned, if possible. Some processors only have alignment requirements for certain types of data, like floating point (e.g. older PowerPCs).

To check if the CPU requires proper alignment, check the FPC_REQUIRES_PROPER_ALIGNMENT (version 1.9 and higher) define. On 32 Bit CPUs this usually means that data up to a size of 4 must be naturally aligned. If you want to access unaligned data, use the move procedure to move it to an aligned location before processing it. The move procedure takes care of unaligned data and handles it properly.

There are multiple strategies for aligning:

  • align every field on a multiple of a certain value (typically a power of two, 1,2,4,8. 1 is equivalent to "packed")
  • pad before every field such that it is aligned on a multiple of its size (so a longint on 4, an int64 on 8 bytes etc). This is typically done by C compiler, which is why FPC calls it {$packrecords C}.

(for arrays or nested records, the size of their largest sub unit is used)

Mac OS X' {$packrecords C} seems to pad the entire record at the end to make it a certain size. This is still being investigated, and probably will be fixed in compiler.

32 Bit vs. 64 Bit

To achive maximum compatiblity with older code, FPC doesn't change the size of predefined data types like integer, longint or word when changing from 32 to 64 Bit. However, the size of a pointer is 8 bytes on a 64 bit architecture so constructs like longint(pointer(p)) are doomed to crash on 64 bit architectures. However, to allow you to write portable code, the FPC system unit introduces the types PtrInt and PtrUInt which are signed and unsigned integer data types with the same size as a pointer.

Keep in mind that the size change of the "pointer" type also affects record sizes. If you allocate records with fixed sizes, and not with new or with getmem (<x>,sizeof(<x>)), this will have to be fixed.

To keep the coherence of the Pascal language it would be better to change the size of Integer and Cardinal for 64 bits, as it happened when of the sprouting of Delphi 2 and as recommends Delphi 7 help: "The generic integer types are Integer and Cardinal; use these whenever possible, since they result in the best performance for the underlying CPU and operating system." In Delphi 1 Integer had 16 bits. The other integer types are called fundamental integer types (Shortint, Smallint, Longint, Int64, Byte, Word, and Longword) and they do not have to change of size between different compiler implementations. To facilitate the transition of codes of 32 for 64 bit it would be better to create new a directive one. For example: {$IntegerHas32bit ON} as interin solution. To complete it would be interesting to create a new type UInt64 instead of the strangers PtrInt and PtrUInt. Remembering that the 64 bit compiler cannot accept the construction longint(pointer(p)), but it must give to the message "Invalid type cast" --Wanderlan 15:42, 8 Feb 2005 (CET)

Calling conventions