UTF8 Tools

From Lazarus wiki
Revision as of 00:50, 1 March 2015 by Alextp (talk | contribs) (moved Theodp to UTF8 Tools: name UTF8tools is from download name of file. old name strange.)
Jump to navigationJump to search

About

This code allows to process Unicode text and determine for unicode chars:

  • if char "letter"
  • if char "digit"
  • if char upper-case, lower-case
  • if char "white space"
  • if char "punctuation"
  • etc.

Also it has class to read/write Unicode from/to TStream.

UTF-8 Tools

Purpose

Some tools for common problems with UTF-8 / Unicode.


  • charencstreams.pas: Load and save data from / to almost any text source like
    • ANSI, UTF8, UTF16, UTF32
    • big or little endian
    • BOM or no BOM

Simple demo:

 fCES := TCharEncStream.Create;
 fCES.LoadFromFile(OpenDialog1.FileName);
 Memo1.text := fCES.UTF8Text;  
 fCES.free;


  • character.pas: Get Information about code points using the TCharacter class.

Demo

if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);


  • utf8scanner.pas: Access UTF-8 strings by code index, use case statements on UTF-8 strings and more...

Index demo

s := TUTF8Scanner.Create(Memo1.text);
for i := 1 to s.Length do
if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);
Memo1.Text := s.UTF8String;
s.free;

Case demo

 s := TUTF8Scanner.Create(Memo1.text);
 s.FindChars := 'öäü';
 repeat
   case s.FindIndex(s.Next) of
 {ö} 0: s.Replace('oe');
 {ä} 1: s.Replace('ae');
 {ü} 2: s.Replace('ue');
   end;
 until s.Done;
 Memo1.Text := s.UTF8String;
 s.free;

Download

Download utf8tools.zip