Parallel procedures

From Lazarus wiki
Revision as of 15:33, 13 December 2008 by Msmat (talk | contribs) (Index 6 was missing and index 0 was redundant)


This page describes how to run single procedures in parallel using the MTProcs unit, which simplifies running procedures in parallel and simplifies implementing parallel algorithms.

Parallel procedures and methods are often found in parallel algorithms and some programming languages provide built-in support for them (e.g. OpenMP in gcc). See here for the plans adding such language features to FPC. Embedding these things into the language can save some typing and allows the compiler to generate code with less overhead. On the other hand there is no single algorithm to run code in parallel. In fact simple approaches often slows down the code. To get good results one must specify some parameters, which a compiler can not guess. For examples see the vast amount of settings and discussions of OpenMP. You need parallel algorithms. MTProcs helps to implement parallel algorithms.

Getting MTProcs

The unit mtprocs.pas is part of the multithreadprocslaz.lpk package, which can be downloaded via svn:

svn co multithreadprocs 

As always: Open the package multithreadprocslaz.lpk in the IDE once, so that it learns the path. To use the package in your project: Either use IDE Menu / Package / Open recent package / .../multithreadprocslaz.lpk / More / add to project to use it in your current project. Or add it via the project inspector.

Simple example

Here is a short example, that does not do anything useful, but to demonstrate how a parallel procedure looks like and how it is called:

<Delphi> program Test;

{$mode objfpc}{$H+}


 cthreads, cmem,

// a simple parallel procedure procedure DoSomethingParallel(Index: PtrInt; Data: Pointer; Item: TMultiThreadProcItem); var

 i: Integer;


 for i:=1 to Index*1000000 do ; // do some work



 ProcThreadPool.DoParallel(@DoSomethingParallel,1,5,nil); // address, startindex, endindex, optional data

end. </Delphi>

The output will be something like this:


Here are some short notes. The details follow later.

  • Multithreading needs under unix the unit cthreads as explained in Multithreaded Application Tutorial.
  • For speed reasons the cmem heap manager is recommended, although it does not make any difference in this example.
  • The parallel procedure DoSomethingParallel gets some fixed and predefined parameters.
  • Index defines what chunk of work should be done by this call.
  • Data is the pointer, that was given to ProcThreadPool.DoParallel as fourth parameter. It is optional and you are free to use it for anything.
  • Item can be used to access some of the more sophisticated features of the thread pool.
  • ProcThreadPool.DoParallel works like a normal procedure call. It returns when it has run completely - that means all threads have finished their work.
  • The output shows the typical multi threading behavior: the order of the calls is not determined. Several runs can result in different orders.


Running a procedure in parallel means:

  • a procedure or method is executed with an Index running from an arbitrary StartIndex to an arbitrary EndIndex.
  • One or more threads execute these index' in parallel. For example if the Index runs from 1 to 10 and there are 3 threads available in the pool, then 3 threads will run three different index at the same time. Every time a thread finishes one call (one index) it allocates the next index and runs it. The result may be: Thread 1 executes index 3,5,7, thread 2 executes 1,6,8,9 and thread 3 runs 2,4,10.
  • The number of threads may vary during run and there is no guarantee for a minimum of threads. In the worst case all index will be executed by one thread - the thread itself.
  • The maximum number of threads is initialized with a good guess for the current system. It can be manually set at any time via
  • You can set the maximum threads for each procedure.
  • A parallel procedure (or method) can call recursively parallel procedures (or methods).
  • Threads are reused, that means they are not destroyed and created for each index, but there is a global pool of threads. On a dual core processor there will be two threads doing all the work - the main thread and one extra thread in the pool.

Overhead, slow down

The overhead heavily depends on the system (number and types of cores, type of shared memory, speed of critical sections, cache size). Here are some general hints:

  • Each chunk of work (index) should take at least some milliseconds.
  • The overhead is independent of recursive levels of parallel procedures.

Multi threading overhead, which is independent of the MTProcs units, but simply results from todays computer architectures:

  • As soon as one thread is created your program become multi threaded and the memory managers must use critical sections, which slows down. So even if you do nothing with the thread your program might become slower.
  • The cmem heap manager is on some systems much faster for multi threading. In my benchmarks especially on intel systems and especially under OS X the speed difference can be more than 10 times.
  • Strings and interfaces are globally reference counted. Each access needs a critical section. Processing strings in multiple threads will therefore hardly give any speed up. Use PChars instead.
  • Each chunk of work (index) should work on a disjunctive part of memory to avoid cross cache updates.
  • Do not work on vast amounts of memory. On some systems one thread alone is fast enough to fill the memory bus speed. When the memory bus maximum speed is reached, any further thread will slow down instead of making it faster.

Setting the maximum number of threads for a procedure

You can specify the maximum number of threads for a procedure as the fifth parameter.

<Delphi> begin

 ProcThreadPool.DoParallel(@DoSomethingParallel,1,10,nil,2); // address, startindex, endindex, 
     // optional: data (here: nil), optional: maximum number of threads (here: 2)

end. </Delphi>

This can be useful, when the threads work on the same data, and a too many threads will create so many cache conflicts, that they slow down each other. Or when the algorithm uses a lot of WaitForIndex, so that only a few threads can actually work. Then the threads can be used for other tasks.

Wait for index / executing in order

Sometimes an Index depends on the result of a former Index. For example a common task is to first compute chunk 5 and then combine the result with the result of chunk 3. Use the WaitForIndex method for that:

<Delphi> procedure DoSomethingParallel(Index: PtrInt; Data: Pointer; Item: TMultiThreadProcItem); begin

 ... compute chunk number 'Index' ...
 if Index=5 then begin
   if not Item.WaitForIndex(3) then exit;
   ... compute ...

end; </Delphi>

WaitForIndex takes as an argument an Index that is lower than the current Index or a range. If it returns true, everything worked as expected. If it returns false, then an exception happened in one of the other threads.


If an exception occur in one of the threads, the other threads will finish normally, but will not start a new Index. The pool waits for all threads to finish and will then raise the exception. That's why you can use try..except like always:

<Delphi> try



 On E: Exception do ...

end; </Delphi>

If there are multiple exceptions, only the first exception will be raised. To handle all exceptions, add a try..except inside your parallel method.