Difference between revisions of "UTF8 string class/en"

From Lazarus wiki
Jump to navigationJump to search
Line 1: Line 1:
{{UTF8 String Class en}}
 
 
== What is TbUtf8? ==
 
== What is TbUtf8? ==
  
 
Under construction!!
 
Under construction!!
 +
 +
__TOC__
  
 
With the library TbUtf8 you can easily change UTF8 Strings.
 
With the library TbUtf8 you can easily change UTF8 Strings.

Revision as of 14:26, 23 January 2022

What is TbUtf8?

Under construction!!

With the library TbUtf8 you can easily change UTF8 Strings.

Problem

With Lazarus (Free Pascal) the string UTF8 encoded. However, the "String" type is nothing more than a dynamic byte array. Length returns the number of bytes in the array but not the number of characters. With UTF8, a character can be 4 bytes long and even 7 bytes with combined characters. An example should illustrate this. 'Thomas' 6 characters, 6 bytes in size. 'Thömäs' 6 characters, 8 bytes in size.

Solution

With TbUtf8 you can now easily change and search UTF8 strings with special and combined characters, such as "üäößẶặǺǻǼǽǞǟǍǎḂḃÞþÇçĆćĊċ...". Essentially, the library consists of a UTF8 string class (TIbUtf8).