Fast direct pixel access

From Lazarus wiki
Revision as of 21:12, 21 March 2011 by Chronos (talk | contribs) (optimization with inlining)

Introduction

Standard graphical LCL components provides Canvas object for common drawing. But most of available graphic routines have some overhead given by universality, platform independence and safety. To achieve best drawing speed it is necessary to use specialized bitmap structures and routines. There are some graphical libraries available for faster graphic processing with wide drawing functions. But for some specific usage it is necessary to create custom small library with data structures fitted for own application.


Direct pixel access in libraries is generally slowed-down by more factors:

  • Coordinate limits checking
  • Facility for automatic image updating and redrawing
  • Abstract program constructions as property, static and virtual methods, dynamic two dimensional arrays
  • Support for multiple pixel formats
  • Support for multiple platforms

Pixel format

We can use simple integer pixels.

<delphi>TFastBitmapPixel = Integer;</delphi>


Or more abstract pixels with separated components

<delphi>TFastBitmapPixelComponents = packed record

 Blue: Byte;
 Green: Byte;
 Red: Byte;
 Alpha: Byte;

end;</delphi>

Bitmap structure

Bitmap class should provide direct pixel access given by X, Y coordinate. But some graphic operation could be further optimized by not doing coordinate calculations for every pixel and rather do pixel pointer shifting by simple memory pointer addition. Some mass operation as filling rectangular region could be optimized using Move and FillChar functions.

Two dimensional dynamic array

This is native way to express two dimensional array in pascal. Internal structure is implemented as pointer to array of pointers to data because dynamic array is in fact pointer to array data. Then calculation of pixel position is matter of fetching pointer for rows and add horizontal position to it.

<delphi>interface

type

 TFastBitmap = class
 private
   function GetSize: TPoint;
   procedure SetSize(const AValue: TPoint);
 public
   Pixels: array of array of TFastBitmapPixel;
   property Size: TPoint read GetSize write SetSize;
 end;

implementation

{ TFastBitmap }

function TFastBitmap.GetSize: TPoint; begin

 Result.X := Length(Pixels);
 if Result.X > 0 then Result.Y := Length(Pixels[0])
   else Result.Y := 0;

end;

procedure TFastBitmap.SetSize(const AValue: TPoint); begin

 SetLength(Pixels, AValue.X, AValue.Y);

end;</delphi>

Raw dynamic memory

It is good to have whole bitmap in one compact memory area. Such memory block behave as video memory of video card. Position of pixels have to be calculated by using equation Y * Width + X with use of instructions for addition and multiplication. Access to pixels is pretty fast thanks to GetPixel and SetPixel methods inlining. But more instruction have to be used than in case of two dimensional dynamic array.

<delphi>interface

type

 PFastBitmapPixel = ^TFastBitmapPixel;
 TFastBitmap = class
 private
   FPixelsData: PByte;
   FSize: TPoint;
   function GetPixel(X, Y: Integer): TFastBitmapPixel; inline;
   procedure SetPixel(X, Y: Integer; const AValue: TFastBitmapPixel); inline;
   procedure SetSize(const AValue: TPoint);
 public
   constructor Create;
   destructor Destroy; override;
   property Size: TPoint read FSize write SetSize;
   property Pixels[X, Y: Integer]: TFastBitmapPixel read GetPixel write SetPixel;
 end;

implementation

{ TFastBitmap }

function TFastBitmap.GetPixel(X, Y: Integer): TFastBitmapPixel; begin

 Result := PFastBitmapPixel(FPixelsData + (Y * FSize.X + X) * SizeOf(TFastBitmapPixel))^;

end;

procedure TFastBitmap.SetPixel(X, Y: Integer; const AValue: TFastBitmapPixel); begin

 PFastBitmapPixel(FPixelsData + (Y * FSize.X + X) * SizeOf(TFastBitmapPixel))^ := AValue;

end;

procedure TFastBitmap.SetSize(const AValue: TPoint); begin

 if (FSize.X = AValue.X) and (FSize.Y = AValue.X) then Exit;
 FSize := AValue;
 FPixelsData := ReAllocMem(FPixelsData, FSize.X * FSize.Y * SizeOf(TFastBitmapPixel));

end;

constructor TFastBitmap.Create; begin

 Size := Point(0, 0);

end;

destructor TFastBitmap.Destroy; begin

 FreeMem(FPixelsData);
 inherited Destroy;

end;</delphi>

Pixel operation optimization

Basic line algorithm

This is naive form which is readable but with price of slower processing.

<delphi>procedure TFastBitmap.HorizontalLine(X, Y, Length: Integer; Color: TFastBitmapPixel); var

 I: Integer;

begin

 for I := 0 to Length - 1 do
   Pixels[X + I, Y] := Color;

end;</delphi>

Pointers

With use of pointers we can eliminate much of pixel address addition and multiplication by Pixels property access. Only fast increment operation is performed.

<delphi>procedure TFastBitmap.HorizontalLine(X, Y, Length: Integer; Color: TFastBitmapPixel); var

 I: Integer;
 P: PFastBitmapPixel;

begin

 P := PFastBitmapPixel(FPixelData + (Y * Size.X + X) * SizeOf(TFastBitmapPixel));
 for I := 0 to Length - 1 do begin
   P^ := Color;
   Inc(P);
 end;

end;</delphi>

Mass fill using FillDWord

Access using pointers and incrementation is fastest possible using conventional single operations. But most of todays CPU offer instructions for mass operations like MOVS, STOS for x86 architecture. Pixel size should be 1, 2 or 4 bytes to be able to use this optimization.

<delphi>procedure TFastBitmap.HorizontalLine(X, Y, Length: Integer; Color: TFastBitmapPixel); var

 I: Integer;
 P: PFastBitmapPixel;

begin

 P := PFastBitmapPixel(FPixelData + (Y * Size.X + X) * SizeOf(TFastBitmapPixel));
 FillDWord(P^, Length, Color);

end;</delphi>

Inlining

If code is notably smaller like SetPixel and GetPixel methods it is better to inline instructions rather than do push and pop operations on stact with execution of call and ret instruction. This optimization will be even significant if such operation is executed many times as pixel operations do.

<delphi>procedure TFastBitmap.HorizontalLine(X, Y, Length: Integer; Color: TFastBitmapPixel); inline; var

 I: Integer;
 P: PFastBitmapPixel;

begin

 P := PFastBitmapPixel(FPixelData + (Y * Size.X + X) * SizeOf(TFastBitmapPixel));
 FillDWord(P^, Length, Color);

end;</delphi>

DMA

If memory block have to be copied to another memory place or device memory DMA(Direct Memory Access) can be used. CPU doesn't have to be involved in copy operations and can do further processing. This kind of optimization can be used in OpenGL for copying data to video card memory.

Drawing bitmap on screen

In this test let assume that we have simple bitmap structure designed as two dimensional byte array where each pixel have 256 possible colors. This could be gray image or some palette mapped image. All image manipulation will be done with custom functions with direct pixel access. Thanks to custom data structure functions could be optimized for faster block memory operations if necessary.

To be able to display image on Form custom bitmap have to be copied to some TWinControl canvas area. Image have to be copied repeatedly if motion image is generated. Every bitmap copy in memory take some time. Then our aim is to do as low as possible copy operations and rather copy our bitmap to screen directly if possible.

You can draw image as fast as possible in simple loop: <delphi>repeat

 FastBitmapToBitmap(FastBitmap, Image1.Picture.Bitmap);
 Application.ProcessMessages;

until Terminated;</delphi>

Or draw image for example using Timer with defined drawing interval. Even if nothing is changed on bitmap there is no need to copy bitmap to screen so RedrawPending simple flag could be used. Thanks to delayed draw execution with calling Redraw method drawing of frames could be skipped.

<delphi>TForm1 = class(TForm) published

 procedure Timer1Execute(Sender: TObject);
 ...  

public

 RedrawPending: Boolean;
 Drawing: Boolean;
 FastBitmap: TFastBitmap;
 procedure Redraw;
 ...

end;

procedure TForm1.Redraw; begin

 RedrawPending := True;

end;

procedure TForm1.Timer1Execute(Sender: TObject); begin

 if (not Drawing) and RedrawPending then 
 try
   Drawing := True;
   CustomProcessing(FastBitmap);
   FastBitmapToBitmap(FastBitmap, Image1.Picture.Bitmap);        
 finally
   RedrawPending := False;
   Drawing := False;
 end;

end;</delphi>

Draw methods

TBitmap.Canvas.Pixels

This is most straighforward but slowest method.

<delphi>function FastBitmapToBitmap(FastBitmap: TFastBitmap; Bitmap: TBitmap); var

 X, Y: Integer;

begin

 for X := 0 to FastBitmap.Size.X - 1 do
   for Y := 0 to FastBitmap.Size.Y - 1 do
     Bitmap.Canvas.Pixels[X, Y] := FastBitmap.Pixels[X, Y] * $010101;  

end;</delphi>

TBitmap.Canvas.Pixels with Update locking

Previous method could be speeded up by update locking and thus redusing per pixel update and event signaling.

<delphi>function FastBitmapToBitmap(FastBitmap: TFastBitmap; Bitmap: TBitmap); var

 X, Y: Integer;

begin

 try
   Bitmap.BeginUpdate(True);
   for X := 0 to FastBitmap.Size.X - 1 do
     for Y := 0 to FastBitmap.Size.Y - 1 do
       Bitmap.Canvas.Pixels[X, Y] := FastBitmap.Pixels[X, Y] * $010101;  
 finally
   Bitmap.EndUpdate(False);
 end;

end;</delphi>

TLazIntfImage

TBitmap is general bitmap component compatible with Delphi and it use TColor type for pixels. But LCL provide another component better suited for image handling. This component provide faster access to pixels and in addition it supports alpha channel.

<delphi>uses

 ..., LCLType, LCLProc, LCLIntf;

function FastBitmapToBitmap(FastBitmap: TFastBitmap; Bitmap: TBitmap); var

 X, Y: Integer;
 TempIntfImage: TLazIntfImage;

begin

 try
   TempIntfImage := Bitmap.CreateIntfImage; // Temp image could be precreated and holded owning class
   for X := 0 to FastBitmap.Size.X - 1 do
     for Y := 0 to FastBitmap.Size.Y - 1 do begin
       TempIntfImage.Colors[X, Y] := TColorToFPColor(FastBitmap.Pixels[X, Y] * $010101);
     end;
   Bitmap.LoadFromIntfImage(TempIntfImage);
 finally
   TempIntfImage.Free;
 end;                           

end;</delphi>

TBGRABitmap.ScanLine

Using TBitmap.ScanLine was method used frequently on Delphi. But TBitmap.ScanLine is not supported by LCL. ScanLine property give access to memory starting point for each row raw data. Then direct manipulation with pixels is much faster than using Pixels property as no additional events is fired. But pixel data is dependent on pixel format, row byte alignment and other parameters.

There is graphic library BGRABitmap which allow access to scan lines. Overall speed of this method is pretty good. Drawing is done directly to Canvas of some TWinControl components like TForm of TPaintBox.

<delphi>uses

 ..., BGRABitmap, BGRABitmapTypes;

procedure FastBitmapToCanvas(FastBitmap: TFastBitmap; Canvas: TCanvas); var

 X, Y: Integer;
 P: PInteger; // PBGRAPixel;

begin

 with FastBitmap do
 for Y := 0 to Size.Y - 1 do begin
   P := PInteger(BGRABitmap.ScanLine[Y]);
   for X := 0 to Size.X - 1 do begin
     R^  := (Pixels[X, Y] * $010101) or $ff000000;
     // P^.Red := Pixels[X, Y];
     // P^.Green := Pixels[X, Y];
     // P^.Blue := Pixels[X, Y];
     // P^.Alpha := 255;
     Inc(P);
   end;
 end;
 BGRABitmap.InvalidateBitmap; // Changed by direct access
 BGRABitmap.Draw(Canvas, 0, 0, False);

end;</delphi>

BGRABitmap tutorial shows how to access directly to pixels

TBitmap.RawImage

This method is so far fastest in comparing to previous ones but more complicated as special care have to be given to bitmap data structure. Example assume that bitmap PixelFormat is pf24bit. Accessed raw data may differs across platforms.

<delphi>uses

 ..., GraphType;

function FastBitmapToBitmap(FastBitmap: TFastBitmap; Bitmap: TBitmap); var

 X, Y: Integer;
 PixelPtr: PInteger;
 PixelRowPtr: PInteger;
 P: TPixelFormat;
 RawImage: TRawImage;
 BytePerPixel: Integer;

begin

 try
   Bitmap.BeginUpdate(False);
   RawImage := Bitmap.RawImage;
   PixelRowPtr := PInteger(RawImage.Data);
   BytePerPixel := RawImage.Description.BitsPerPixel div 8;
   for Y := 0 to Size.Y - 1 do begin
     PixelPtr := PixelRowPtr;
     for X := 0 to Size.X - 1 do begin
       PixelPtr^ := Pixels[X, Y] * $010101;
       Inc(PByte(PixelPtr), BytePerPixel);
     end;
     Inc(PByte(PixelRowPtr), RawImage.Description.BytesPerLine);
   end;
 finally
   Bitmap.EndUpdate(False);
 end;  

end;</delphi>

OpenGL

OpenGL is mainly used for 3D complex modeling but it can be used for simple 2D accelerated graphics. We need initialized OpenGL, create one textured rectangle with screen resolution and set orthogonal view. Then we will able to fill texture by our custom converted bitmap data. But this method is not significantly faster then RawImage.Data method because all image data are copied with glTexImage2D function which mean slow copy using CPU.


<delphi>uses

 ..., GL, OpenGLContext;

var

 TextureId: GLuint;
 TextureData: Pointer; 
 OpenGLControl1: TOpenGLControl;

procedure InitGL; begin

 glMatrixMode(GL_PROJECTION);
 glLoadIdentity;
 glOrtho(0, OpenGLControl1.Width, OpenGLControl1.Height, 0, 0, 1);
 glMatrixMode(GL_MODELVIEW);
 glLoadIdentity();
 glDisable(GL_DEPTH_TEST);
 glViewport(0, 0, OpenGLControl1.Width, OpenGLControl1.Height);
 glGenTextures(1, @TextureId);
 glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE); 

end;

function FastBitmapToBitmap(FastBitmap: TFastBitmap; OpenGLControl: TOpenGLControl); var

 X, Y: Integer;
 P: PInteger;
 R: PInteger;

const

 GL_CLAMP_TO_EDGE = $812F;

begin

 glClear(GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT);
 P := OpenGLBitmap;
 with FastBitmap do
 for Y := 0 to Size.Y - 1 do begin
   R := P;
   for X := 0 to Size.X - 1 do begin
     R^  := (Pixels[X, Y] * $010101) or $ff000000;
     Inc(R);
   end;
   Inc(P, Size.X);
 end;
 glLoadIdentity;
 glTranslatef(-OpenGLControl.Width div 2, -OpenGLControl.Height div 2, 0.0);
 glEnable(GL_TEXTURE_2D);
 glBindTexture(GL_TEXTURE_2D, TextureId);
   //glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
   //glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
   glTexImage2D(GL_TEXTURE_2D, 0, 4, OpenGLControl.Width, OpenGLControl.Height,
     0, GL_RGBA, GL_UNSIGNED_BYTE, OpenGLBitmap);
 glBegin(GL_QUADS);
   glColor3ub(255, 255, 255);
   glTexCoord2f(0, 0);
   glVertex3f(0, 0, 0);
   glTexCoord2f(OpenGLControl.Width div 2, 0);
   glVertex3f(OpenGLControl.Width, 0, 0);
   glTexCoord2f(OpenGLControl.Width div 2, OpenGLControl.Height div 2);
   glVertex3f(OpenGLControl.Width, OpenGLControl.Height, 0);
   glTexCoord2f(0, OpenGLControl.Height div 2);
   glVertex3f(0, OpenGLControl.Height, 0);
 glEnd();
 OpenGLControl.SwapBuffers;

end;</delphi>

OpenGL PBO

This method use asynchronous DMA transfer to copy texture data thus CPU is free to do further computations. It also eliminate one additional copy operation which is done by glTexImage2D in previous method. Method require GL_ARB_pixel_buffer_object extension.

<delphi>uses

 ..., GL, OpenGLContext;

var

 TextureId: GLuint;
 TextureData: Pointer; 
 OpenGLControl1: TOpenGLControl;
 pboIds: array[0..1] of GLuint;

procedure InitGL; var

 DataSize: Integer;

begin

 glMatrixMode(GL_PROJECTION);
 glLoadIdentity;
 glOrtho(0, OpenGLControl1.Width, OpenGLControl1.Height, 0, 0, 1);
 glMatrixMode(GL_MODELVIEW);
 glLoadIdentity();
 glDisable(GL_DEPTH_TEST);
 glViewport(0, 0, OpenGLControl1.Width, OpenGLControl1.Height);
 glGenTextures(1, @TextureId);
 glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE); 
 OpenGLControl1.MakeCurrent;
 DataSize := OpenGLControl1.Width * OpenGLControl1.Height * SizeOf(Integer);
 if Load_GL_ARB_vertex_buffer_object then begin
   glGenBuffersARB(2, @pboIds);
   glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[0]);
   glBufferDataARB(GL_PIXEL_PACK_BUFFER_ARB, DataSize, Pointer(0), GL_STREAM_READ_ARB);
   glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboIds[1]);
   glBufferDataARB(GL_PIXEL_PACK_BUFFER_ARB, DataSize, Pointer(0), GL_STREAM_READ_ARB);
 end else raise Exception.Create('GL_ARB_pixel_buffer_object not supported');
 glEnable(GL_TEXTURE_2D);
 glBindTexture(GL_TEXTURE_2D, TextureId);
   //glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
   //glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
   glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
   glTexImage2D(GL_TEXTURE_2D, 0, 4, OpenGLControl1.Width, OpenGLControl1.Height,
     0, GL_RGBA, GL_UNSIGNED_BYTE, OpenGLBitmap);

end;

function FastBitmapToBitmap(FastBitmap: TFastBitmap; OpenGLControl: TOpenGLControl); var

 X, Y: Integer;
 P: PInteger;
 R: PInteger;
 Ptr: ^GLubyte;
 TextureShift: TPoint;
 TextureShift2: TPoint;

const

 GL_CLAMP_TO_EDGE = $812F;

begin

 // "index" is used to read pixels from framebuffer to a PBO
 // "nextIndex" is used to update pixels in the other PBO
 Index := (Index + 1) mod 2;
 NextIndex := (Index + 1) mod 2;
 glLoadIdentity;
 // bind the texture and PBO
 glBindTexture(GL_TEXTURE_2D, TextureId);
 glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[index]);
 // copy pixels from PBO to texture object
 // Use offset instead of ponter.
 glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, OpenGLControl.Width, OpenGLControl.Height,
   GL_BGRA, GL_UNSIGNED_BYTE, Pointer(0));


 // bind PBO to update texture source
 glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, pboIds[nextIndex]);
 // Note that glMapBufferARB() causes sync issue.
 // If GPU is working with this buffer, glMapBufferARB() will wait(stall)
 // until GPU to finish its job. To avoid waiting (idle), you can call
 // first glBufferDataARB() with NULL pointer before glMapBufferARB().
 // If you do that, the previous data in PBO will be discarded and
 // glMapBufferARB() returns a new allocated pointer immediately
 // even if GPU is still working with the previous data.
 glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, OpenGLControl.Width * OpenGLControl.Height * SizeOf(Integer), Pointer(0), GL_STREAM_DRAW_ARB);
 // map the buffer object into client's memory
 ptr := glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB);
 if Assigned(ptr) then begin
   // update data directly on the mapped buffer
   P := PInteger(Ptr);
   with FastBitmap do
   for Y := 0 to Size.Y - 2 do begin
     R := P;
     for X := 0 to Size.X - 1 do begin
       R^ := NoSwapBRComponent(Pixels[X, Y]) or $ff000000;
       Inc(R);
     end;
     Inc(P, Size.X);
   end;
   glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB);
 end;
 // it is good idea to release PBOs with ID 0 after use.
 // Once bound with 0, all pixel operations are back to normal ways.
 glBindBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
 glClear(GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT);
 glTranslatef(-OpenGLControl.Width / 2, -OpenGLControl.Height / 2, 0.0);
 glBindTexture(GL_TEXTURE_2D, TextureId);
 glBegin(GL_QUADS);
   glColor3ub(255, 255, 255);
   glTexCoord2f(0, 0);
   glVertex3f(0, TextureShift2.Y, 0);
   glTexCoord2f(OpenGLControl.Width div 2, 0);
   glVertex3f(OpenGLControl.Width, 0, 0);
   glTexCoord2f(OpenGLControl.Width div 2, OpenGLControl.Height div 2);
   glVertex3f(OpenGLControl.Width, OpenGLControl.Height, 0);
   glTexCoord2f(0, OpenGLControl.Height div 2);
   glVertex3f(0, OpenGLControl.Height, 0);
 glEnd();                   
 OpenGLControl.SwapBuffers;

end;</delphi>

Speed comparison

This table shows only raw benchmark results which are dependent on used computer hardware and operating system.

Method Frame duration [ms]
TBitmap.Canvas.Pixels 650
TBitmap.Canvas.Pixels with BeginUpdate and EndUpdate 120
TLazIntfImage  12
TBGRABitmap.ScanLine 2.9
TBitmap.RawImage 1.1
OpenGL 2.2
OpenGL PBO 1.6

See also