WebAssembly/Threads

From Lazarus wiki
Jump to navigationJump to search

Thread support

This page contains some collected informations on the features needed for thread support in WebAssembly (in the browser).

Thread support consists of 4 parts:

  • Atomic instructions.
  • Shared memory and passive segments.
  • Thread Local Storage (threadvars)
  • Actually starting a thread.


Atomic instructions

The proposed specs 

When the Free Pascal RTL is compiled with -CTwasmthreads, the following RTL functions will use the new atomic instructions and thus should be thread safe in a multithreaded environment:

InterlockedDecrement
InterlockedIncrement
InterlockedExchange
InterlockedCompareExchange
InterlockedExchangeAdd

Note that these require proper alignment (4 bytes) of the target, otherwise they trap (i.e. terminate the program with a stack trace).

In addition to that, there are many more atomic functions available in the WebAssembly unit:

const
  { Special values for the TimeoutNanoseconds parameter of AtomicWait }
  awtInfiniteTimeout = -1;
  { AtomicWait result values }
  awrOk = 0;       { woken by another agent in the cluster }
  awrNotEqual = 1; { the loaded value did not match the expected value }
  awrTimedOut = 2; { not woken before timeout expired }

procedure AtomicFence; inline;

function AtomicLoad(constref Mem: Int8): Int8; inline;
function AtomicLoad(constref Mem: UInt8): UInt8; inline;
function AtomicLoad(constref Mem: Int16): Int16; inline;
function AtomicLoad(constref Mem: UInt16): UInt16; inline;
function AtomicLoad(constref Mem: Int32): Int32; inline;
function AtomicLoad(constref Mem: UInt32): UInt32; inline;
function AtomicLoad(constref Mem: Int64): Int64; inline;
function AtomicLoad(constref Mem: UInt64): UInt64; inline;

procedure AtomicStore(out Mem: Int8; Data: Int8); inline;
procedure AtomicStore(out Mem: UInt8; Data: UInt8); inline;
procedure AtomicStore(out Mem: Int16; Data: Int16); inline;
procedure AtomicStore(out Mem: UInt16; Data: UInt16); inline;
procedure AtomicStore(out Mem: Int32; Data: Int32); inline;
procedure AtomicStore(out Mem: UInt32; Data: UInt32); inline;
procedure AtomicStore(out Mem: Int64; Data: Int64); inline;
procedure AtomicStore(out Mem: UInt64; Data: UInt64); inline;

function AtomicAdd(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicAdd(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicAdd(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicAdd(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicAdd(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicAdd(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicAdd(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicAdd(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicSub(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicSub(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicSub(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicSub(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicSub(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicSub(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicSub(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicSub(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicAnd(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicAnd(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicAnd(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicAnd(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicAnd(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicAnd(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicAnd(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicAnd(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicOr(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicOr(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicOr(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicOr(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicOr(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicOr(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicOr(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicOr(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicXor(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicXor(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicXor(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicXor(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicXor(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicXor(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicXor(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicXor(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicExchange(var Mem: Int8; Data: Int8): Int8; inline;
function AtomicExchange(var Mem: UInt8; Data: UInt8): UInt8; inline;
function AtomicExchange(var Mem: Int16; Data: Int16): Int16; inline;
function AtomicExchange(var Mem: UInt16; Data: UInt16): UInt16; inline;
function AtomicExchange(var Mem: Int32; Data: Int32): Int32; inline;
function AtomicExchange(var Mem: UInt32; Data: UInt32): UInt32; inline;
function AtomicExchange(var Mem: Int64; Data: Int64): Int64; inline;
function AtomicExchange(var Mem: UInt64; Data: UInt64): UInt64; inline;

function AtomicCompareExchange(var Mem: Int8; Compare, Data: Int8): Int8; inline;
function AtomicCompareExchange(var Mem: UInt8; Compare, Data: UInt8): UInt8; inline;
function AtomicCompareExchange(var Mem: Int16; Compare, Data: Int16): Int16; inline;
function AtomicCompareExchange(var Mem: UInt16; Compare, Data: UInt16): UInt16; inline;
function AtomicCompareExchange(var Mem: Int32; Compare, Data: Int32): Int32; inline;
function AtomicCompareExchange(var Mem: UInt32; Compare, Data: UInt32): UInt32; inline;
function AtomicCompareExchange(var Mem: Int64; Compare, Data: Int64): Int64; inline;
function AtomicCompareExchange(var Mem: UInt64; Compare, Data: UInt64): UInt64; inline;

function AtomicWait(constref Mem: Int32; Compare: Int32; TimeoutNanoseconds: Int64): Int32; inline;
function AtomicWait(constref Mem: UInt32; Compare: UInt32; TimeoutNanoseconds: Int64): Int32; inline;
function AtomicWait(constref Mem: Int64; Compare: Int64; TimeoutNanoseconds: Int64): Int32; inline;
function AtomicWait(constref Mem: UInt64; Compare: UInt64; TimeoutNanoseconds: Int64): Int32; inline;

function AtomicNotify(constref Mem: Int32; Count: UInt32): UInt32; inline;
function AtomicNotify(constref Mem: UInt32; Count: UInt32): UInt32; inline;
function AtomicNotify(constref Mem: Int64; Count: UInt32): UInt32; inline;
function AtomicNotify(constref Mem: UInt64; Count: UInt32): UInt32; inline;

Shared memory and passive segments

First, the memory needs to be declared shared.

Secondly, the data segments need to be declared passive segments and extra startup code should be generated by the compiler to initialize them only once. Without this, when the module is instantiated on a new WebWorker (in order to start a new thread), this will cause memory to be initialized again to the initial state, which is not what we want when starting a thread.

Some info: Shared Memory and Passive Segments

Turns out, this is all done by the LLVM linker (including the initialization startup code), when you pass the appropriate command line options. The compiler now passes these options to the linker, when a program is compiled with -CTbfexceptions. As a side effect, such programs no longer work with "wasmtime run --enable-features threads", but that's because wasmtime's threads support is incomplete.

Thread Local Storage (threadvars)

Special consideration is needed to support threadvars.

More info here: Thread Local Storage

Code generation for threadvar access is now implemented in FPC. It follows the ABI convention for TLS from Emscripten. However, it causes the LLVM 14 linker to crash. The LLVM 15 (release candidate) linker from Emscripten seems to work.

Actually starting a thread

Webassembly relies on the hosting environment to actually start threads.

Some extra info:

Unfortunately, the WASI Native Threads API proposal is very incomplete. Emscripten implements threads using a different API/ABI, but it's quite messy and poorly documented.

Starting a thread requires the following steps:

  • WebAssembly: allocating a block of memory for the stack and TLS (threadvar) block for the new thread. This needs to be done in a thread safe manner (can we use the heap?). TODO: How do we determine the stack size for the new thread? Do we use the same main stack size as specified by the {$M stacksize} directive?
  • WebAssembly: calling an external (JavaScript) function and passing at least the following data, that needs to be passed to the new thread by JavaScript code:
 - the start address of the new thread procedure
 - the args that need to be passed to the new thread procedure
 - the stack and TLS address
 - TODO: how/where do we determine the thread ID?
  • JavaScript: TBD...delegate the new thread to a Web Worker, pass the arguments, etc.
  • WebAssembly: a function that sets up the new thread, by setting up:
 - the stack pointer in linear memory (might need inline asm, an external wasm module or compiler magic code generation)
 - initialize global variables that hold the TLS (threadvar) block (calling __wasm_init_tls)
 - call the actual thread function and pass its parameters