unicode use cases

From Lazarus wiki
Revision as of 13:20, 23 November 2008 by Sekelsenmat (talk | contribs)
Jump to navigationJump to search

Introduction

Currently, there's a lot of interest in the implementation of full Unicode support into fpc. This page is destined to describe situations where the developers face problems when dealing with Unicode characters/strings. In order to keep the information useful the description should be the more detailed possible and provide real code/examples when available.

Cases

Sqlite library requires filename to be encoded as UTF-8

The sqlite3 wrapper class provided by fpc (TSqlite3Dataset) stores the FileName property into an String type (ansistring) and uses it to open the database through a sqlite function (sqlite3_open) that expects an UTF-8 encoded string. This works fine as long the string is UTF-8 encoded or has only ASCII characters. The problem is that the encoding varies according to the situation. LCL and *nix rtl return UTF-8, Win32 rtl returns the current locale encoding. Some workarounds were tried:

  • Call UTF8Encode inside FileName property method setter
    This will work when the string is not UTF-8. When the string is already encoded in UTF-8, UTF8Encode will corrupt the string. Since there's not clean way to guess encoding, the option is not doable.
  • Call UTF8Encode or not in the source string, before setting the FileName property
    This will handle the "strings coming from LCL" case, since i know is always UTF-8. But using a string returned by a rtl function like GetAppConfigDir can lead to problems, e.g., in win32 systems with accented characters in the returned path will be necessary to call UTF8Encode while is not necessary and dangerous in *nix systems.

So, in this case, AFAIK, there's no way to write a cross platform solution without using defines.

See Also