Difference between revisions of "UTF8 Tools"
From Lazarus wiki
Jump to navigationJump to search (→About: text new) |
m (→About) |
||
Line 1: | Line 1: | ||
== About == | == About == | ||
− | This code allows to process Unicode | + | This code allows to process Unicode text and determine for unicode chars: |
* if char "letter" | * if char "letter" | ||
Line 7: | Line 7: | ||
* if char upper-case, lower-case | * if char upper-case, lower-case | ||
* if char "white space" | * if char "white space" | ||
− | * if char " | + | * if char "punctuation" |
* etc. | * etc. | ||
− | Also it has class to read/write Unicode to TStream. | + | Also it has class to read/write Unicode from/to TStream. |
= UTF-8 Tools = | = UTF-8 Tools = |
Revision as of 00:48, 1 March 2015
About
This code allows to process Unicode text and determine for unicode chars:
- if char "letter"
- if char "digit"
- if char upper-case, lower-case
- if char "white space"
- if char "punctuation"
- etc.
Also it has class to read/write Unicode from/to TStream.
UTF-8 Tools
Purpose
Some tools for common problems with UTF-8 / Unicode.
- charencstreams.pas: Load and save data from / to almost any text source like
- ANSI, UTF8, UTF16, UTF32
- big or little endian
- BOM or no BOM
Simple demo:
fCES := TCharEncStream.Create;
fCES.LoadFromFile(OpenDialog1.FileName);
Memo1.text := fCES.UTF8Text;
fCES.free;
- character.pas: Get Information about code points using the TCharacter class.
Demo
if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]);
- utf8scanner.pas: Access UTF-8 strings by code index, use case statements on UTF-8 strings and more...
Index demo
s := TUTF8Scanner.Create(Memo1.text); for i := 1 to s.Length do if TCharacter.IsLetter(s[i]) then s[i] := TCharacter.toLower(s[i]); Memo1.Text := s.UTF8String; s.free;
Case demo
s := TUTF8Scanner.Create(Memo1.text); s.FindChars := 'öäü'; repeat case s.FindIndex(s.Next) of {ö} 0: s.Replace('oe'); {ä} 1: s.Replace('ae'); {ü} 2: s.Replace('ue'); end; until s.Done; Memo1.Text := s.UTF8String; s.free;