UTF8 Tools
From Lazarus wiki
Jump to navigationJump to searchAbout
This code allows to process Unicode text and determine for unicode chars:
- if char "letter"
- if char "digit"
- if char upper-case, lower-case
- if char "white space"
- if char "punctuation"
- etc.
Also it has class to read/write Unicode from/to TStream.
UTF-8 Tools
Purpose
Some tools for common problems with UTF-8 / Unicode.
- charencstreams.pas: Load and save data from / to almost any text source like
- ANSI, UTF8, UTF16, UTF32
- big or little endian
- BOM or no BOM
Simple demo:
fCES := TCharEncStream.Create;
fCES.LoadFromFile(OpenDialog1.FileName);
Memo1.text := fCES.UTF8Text;
fCES.free;
- character.pas: Get Information about code points using the TCharacter class.
Demo
if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);
- utf8scanner.pas: Access UTF-8 strings by code index, use case statements on UTF-8 strings and more...
Index demo
s := TUTF8Scanner.Create(Memo1.text); for i := 1 to s.Length do if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]); Memo1.Text := s.UTF8String; s.free;
Case demo
s := TUTF8Scanner.Create(Memo1.text); s.FindChars := 'öäü'; repeat case s.FindIndex(s.Next) of {ö} 0: s.Replace('oe'); {ä} 1: s.Replace('ae'); {ü} 2: s.Replace('ue'); end; until s.Done; Memo1.Text := s.UTF8String; s.free;