Difference between revisions of "UTF8 string class/en"
(Download / Installation) |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | {{UTF8 string class}} | |
− | |||
− | + | __TOC__ | |
− | + | == What is TbUtf8? == | |
With the library TbUtf8 you can easily change UTF8 Strings. | With the library TbUtf8 you can easily change UTF8 Strings. | ||
Line 30: | Line 29: | ||
*Since UTF8 does not have a constant offset from character to character, searching for characters is much more complex. Iterating over the characters is about 20 times slower than with the string. (Comfort has its price) | *Since UTF8 does not have a constant offset from character to character, searching for characters is much more complex. Iterating over the characters is about 20 times slower than with the string. (Comfort has its price) | ||
*Slightly more memory is required. | *Slightly more memory is required. | ||
+ | |||
+ | |||
+ | === Example === | ||
+ | <syntaxhighlight lang="pascal"> | ||
+ | proceudre Demo01: Boolean; | ||
+ | var | ||
+ | u: IbUtf8; | ||
+ | i: Integer; | ||
+ | begin | ||
+ | u:= TIbUtf8.Create('Thömäß'); | ||
+ | for i:= 1 to u.NumberOfChars do begin | ||
+ | case u.Chars[i] of | ||
+ | 'ö': u.Chars[i]:= 'o'; | ||
+ | 'ä': u.Chars[i]:= 'a'; | ||
+ | 'ß': u.Chars[i]:= 's'; | ||
+ | end; | ||
+ | end; | ||
+ | if u.Text = 'Thomas' then begin | ||
+ | WriteLn('That's right!'); | ||
+ | end; | ||
+ | end; | ||
+ | </syntaxhighlight> | ||
== DownLoad == | == DownLoad == | ||
Line 50: | Line 71: | ||
:Now, click Use->Add to Project | :Now, click Use->Add to Project | ||
:Close then Package window. | :Close then Package window. | ||
+ | |||
+ | == Functional Description == | ||
+ | The functional description, you can found under the project folder "doc/". | ||
+ | |||
+ | |||
+ | [[Category:Unicode]] |
Latest revision as of 17:36, 23 January 2022
│
English (en) │
What is TbUtf8?
With the library TbUtf8 you can easily change UTF8 Strings.
Problem
With Lazarus (Free Pascal) the string UTF8 encoded. However, the "String" type is nothing more than a dynamic byte array. Length returns the number of bytes in the array but not the number of characters. With UTF8, a character can be 4 bytes long and even 7 bytes with combined characters. An example should illustrate this. 'Thomas' 6 characters, 6 bytes in size. 'Thömäs' 6 characters, 8 bytes in size.
Solution
With TbUtf8 you can now easily change and search UTF8 strings with special and combined characters, such as "üäößẶặǺǻǼǽǞǟǍǎḂḃÞþÇçĆćĊċ...". Essentially, the library consists of a UTF8 string class (TIbUtf8).
Benefits
- TIbUtf8 is a class type of TInterfacedObject and does not need to be cleaned up with free.
- All indexes are character based.
- All returned characters are of type String.
- Returns the number of characters in the string.
- Returns the number of bytes in the string.
- Delete characters or character groups.
- Insertion of characters and character groups.
- Appending characters and character groups.
- Reading / writing of characters and character groups.
- Read from file / write to a file.
- Read from stream / write to a stream.
Disadvantage
- Since UTF8 does not have a constant offset from character to character, searching for characters is much more complex. Iterating over the characters is about 20 times slower than with the string. (Comfort has its price)
- Slightly more memory is required.
Example
proceudre Demo01: Boolean;
var
u: IbUtf8;
i: Integer;
begin
u:= TIbUtf8.Create('Thömäß');
for i:= 1 to u.NumberOfChars do begin
case u.Chars[i] of
'ö': u.Chars[i]:= 'o';
'ä': u.Chars[i]:= 'a';
'ß': u.Chars[i]:= 's';
end;
end;
if u.Text = 'Thomas' then begin
WriteLn('That's right!');
end;
end;
DownLoad
- GitLab FpTuxe/TbUtf8 repository
FpTuxe/TbUtf8
- Git clone
git clone https://gitlab.com/FpTuxe/tbutf8.git
Installation
- Variant 1
- Start Lazarus and open your project.
- Lazarus->File->Open your workspace/tbutf8/src/tb_utf8.pas
- Lazarus->Project->Add Editor File to Project
- Variant 2
- Start Lazarus and open your project.
- Lazarus->Package->Open Package File (.lpk) your workspace/tbutf8/src/tbutf8.lpk
- Now, click Use->Add to Project
- Close then Package window.
Functional Description
The functional description, you can found under the project folder "doc/".