Difference between revisions of "UTF8 Tools"

From Lazarus wiki
Jump to navigationJump to search
(→‎About: text new)
Line 1: Line 1:
 
== About ==
 
== About ==
  
Sharing some of my code
+
This code allows to process Unicode (in UTF8 or UTF32) text and for exampple determine for unicode chars:
  
----
+
* if char "letter"
 +
* if char "digit"
 +
* if char upper-case, lower-case
 +
* if char "white space"
 +
* if char "pucntuation"
 +
* etc.
 +
 
 +
Also it has class to read/write Unicode to TStream.
  
 
= UTF-8 Tools =
 
= UTF-8 Tools =

Revision as of 01:47, 1 March 2015

About

This code allows to process Unicode (in UTF8 or UTF32) text and for exampple determine for unicode chars:

  • if char "letter"
  • if char "digit"
  • if char upper-case, lower-case
  • if char "white space"
  • if char "pucntuation"
  • etc.

Also it has class to read/write Unicode to TStream.

UTF-8 Tools

Purpose

Some tools for common problems with UTF-8 / Unicode.


  • charencstreams.pas: Load and save data from / to almost any text source like
    • ANSI, UTF8, UTF16, UTF32
    • big or little endian
    • BOM or no BOM

Simple demo:

 fCES := TCharEncStream.Create;
 fCES.LoadFromFile(OpenDialog1.FileName);
 Memo1.text := fCES.UTF8Text;  
 fCES.free;


  • character.pas: Get Information about code points using the TCharacter class.

Demo

if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);


  • utf8scanner.pas: Access UTF-8 strings by code index, use case statements on UTF-8 strings and more...

Index demo

s := TUTF8Scanner.Create(Memo1.text);
for i := 1 to s.Length do
if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);
Memo1.Text := s.UTF8String;
s.free;

Case demo

 s := TUTF8Scanner.Create(Memo1.text);
 s.FindChars := 'öäü';
 repeat
   case s.FindIndex(s.Next) of
 {ö} 0: s.Replace('oe');
 {ä} 1: s.Replace('ae');
 {ü} 2: s.Replace('ue');
   end;
 until s.Done;
 Memo1.Text := s.UTF8String;
 s.free;

Download

Download utf8tools.zip