Assembler and ABI Resources

From Lazarus wiki
Revision as of 17:22, 9 October 2011 by MarkMLl (talk | contribs) (→‎Assembler source formats: More text.)
Jump to navigationJump to search

The Assembler

The FPC Pascal Compiler translates Pascal source code into assembly language which is then processed by an assembler running as a separate backend. Some other Pascal compilers directly generate object modules or executable programs directly, i.e. they do not require a separate assembler.

An assembler is itself an executable program that translates assembly language into an object module. In most cases the object modules are passed to a linker which then produces an executable program, although in some there are additional stages (code signing for secure operating systems, conversion to a binary for embedded systems and so on).

The ABI

The interface between an executable program and the underlying operating system is referred to as the Application Binary Interface or ABI. This includes the CPU's operating mode (e.g. whether word and address sizes default to 32 or 64 bits), operand alignment, function calling conventions, system call numbers, and a selection of constants (e.g. file open modes) and structures (e.g. as returned by the stat() function). It is also usually considered to include the format of the object modules, executable and library files.

Obviously the ABI is grossly different between operating systems: in general a program compiled for Windows will not run on Linux and vice versa. In addition, however, there is a significant amount of variation between different "flavours" of related operating systems, for example not only are the system call numbers different between SPARC Solaris and SPARC Linux but they are different between SPARC Linux and x86 Linux.

Purpose of this note

In most cases FPC uses the GNU assembler (as or gas) as its backend. However, the assembly language syntax expected by this is different for each target CPU, sections below give examples of this. The original incentive for this was because the author (MarkMLl) found that he needed to write an assembler reader for the MIPS processor, and that there was no straightforward comparison of existing formats on which he could base new code.

In addition, in some cases the details of the assembly language format or the ABI specification are only available to users registered with the relevant manufacturer, where possible links to unofficial mirrors are given below for casual reference.

Assembler source formats

Assembler source emitted by the compiler's code generator has to be (a subset of what is) acceptable to the assembler for the relevant target CPU. In addition, small portions of the RTL (e.g. prt0.as) are of necessity written in assembler, and some Pascal source files (e.g. syscall.inc) contain inline assembler which the compiler has to be able to parse before it is passed to the backend.

The list of CPUs below is taken from the compiler as of late 2011. Some of these are no longer supported, or exist merely as minimal stubs.

Alpha

This compiler exists only as a minimal stub.

ARM

This fragment is from FpSysCall alias FPC_SYSCALL6 in FPC's ./rtl/linux/arm/syscall.inc:

asm
  stmfd sp!,{r4,r5,r6}
  ldr  r4,param4
  ldr  r5,param5
  ldr  r6,param6
  bl FPC_SYSCALL
  ldmfd sp!,{r4,r5,r6}
end;

This fragment is from ret_from_fork in Linux's ./arch/arm/kernel/entry-common.S:

ENTRY(ret_from_fork)
        bl      schedule_tail
        get_thread_info tsk
        ldr     r1, [tsk, #TI_FLAGS]            @ check for syscall tracing
        mov     why, #1
        tst     r1, #_TIF_SYSCALL_TRACE         @ are we tracing syscalls?
        beq     ret_slow_syscall
        mov     r1, sp
        mov     r0, #1                          @ trace exit [IP = 1]
        bl      syscall_trace
        b       ret_slow_syscall
ENDPROC(ret_from_fork)

Note that register names are r0, r1 etc. without a sigil, and that assignment is right-to-left.

AVR

This fragment is from ret_from_fork in Linux's ./arch/avr32/kernel/entry-avr32b.S:

ret_from_fork:
        call   schedule_tail

        /* check for syscall tracing */
        get_thread_info r0
        ld.w    r1, r0[TI_flags]
        andl    r1, _TIF_ALLWORK_MASK, COH
        brne    syscall_exit_work
        rjmp    syscall_exit_cont

Note that register names are r0, r1 etc. without a sigil, and that assignment is right-to-left.

i386

This fragment is from FpSysCall alias FPC_SYSCALL6 in FPC's ./rtl/linux/i386/syscall.inc:

asm
        push  %ebx
        push  %edx
        push  %esi
        push  %edi
        push  %ebp
        push  %ecx
        cmp   $0, sysenter_supported
        jne   .LSysEnter
        movl  %edx,%ebx         // param1
        pop   %ecx              // param2
        movl  param3,%edx       // param3
        movl  param4,%esi       // param4
        movl  param5,%edi       // param5
        movl  param6,%ebp       // param6
        int   $0x80
        jmp   .LTail
  .LSysEnter:
        movl  %edx,%ebx         // param1
        pop   %ecx              // param2
        movl  param3,%edx       // param3
        movl  param4,%esi       // param4
        movl  param5,%edi       // param5
        movl  param6,%ebp       // param6
        call psysinfo
  .LTail:
        pop   %ebp
        pop   %edi
        pop   %esi
        pop   %edx
        pop   %ebx
        cmpl  $-4095,%eax
        jb    .LSyscOK
        negl  %eax
        call  seterrno
        movl  $-1,%eax
  .LSyscOK:
end;

This fragment is from ret_from_fork in Linux's ./arch/x86/kernel/entry_32.S:

ENTRY(ret_from_fork)
        CFI_STARTPROC
        pushl %eax
        CFI_ADJUST_CFA_OFFSET 4
        call schedule_tail
        GET_THREAD_INFO(%ebp)
        popl %eax
        CFI_ADJUST_CFA_OFFSET -4
        pushl $0x0202                   # Reset kernel eflags
        CFI_ADJUST_CFA_OFFSET 4
        popfl
        CFI_ADJUST_CFA_OFFSET -4
        jmp syscall_exit
        CFI_ENDPROC
END(ret_from_fork)

Note that register names are eax, ebx etc. with % as a sigil, and that assignment is left-to-right.

IA-64

This compiler exists only as a minimal stub.

M68K

This compiler exists in FPC v1 but has never been ported to v2.

MIPS

PowerPC

PowerPC-64

SPARC

VIS

This compiler exists only as a minimal stub.

x86

Refer to i386 above.

x86-64

ABI references

The list of CPUs etc. is based on those found in the compiler (see above).

Alpha

ARM

AVR

i386

IA-64

M68K

MIPS

PowerPC

PowerPC-64

SPARC

VIS

x86

x86-64

Other resources

As a general point, there's some useful thoughts on binary disassembly at http://chdk.wikia.com/wiki/GPL_Disassembling for situations where IDA or equivalent aren't available.