Talk:LCL Unicode Support

From Lazarus wiki
Revision as of 11:09, 2 August 2011 by BigChimp (talk | contribs) (Formatting)
Jump to navigationJump to search

Questions July 2011

1. Widestrings and Ansistrings section: nowhere does it say what a widestring is, but it implies a UTF16 encoded Unicode string. Is that correct (for all platforms)? If so, I'll add it to that section.

2. FPC is not Unicode aware section: the FCL and RTL are not unicode aware according to the article. But obviously the string type IS unicode aware (and, based on 1, also the Widestring type). Is that correct? Is a string always UTF8 encoded or does it depend on compiler switches - I can imagine that the shortstring compiler switch (forgot the name) leads to ANSI encoded strings...

3. Instructions for users section: "Usually the encoding is per-library (e.g. a dynamic library dll or a lazarus package)". Maybe a stupid question, but how can you tell whether a Lazarus component or package supports unicode? Go hunting through the source code for comments?

--BigChimp 14:39, 23 July 2011 (CEST)

Some questions / things unclear

I want to undertand more and have the following question:

  • Suppose the LCL encodes all its strings using UTF8, could you display for instance cyrilic or Arabic in a Russian or Arabian version of win98, i.e. without using the *W winapi calls? As far as I know both together will be impossible then, because no code page supports both charsets. Please correct me, if wrong.

Maybe it is best, if we create a new widgetset win32u, that uses the unicode functions instead of ascii functions. Vincent 14:33, 20 May 2006 (CEST)

But remember that you do can use one of this languages at a time on Win98. Suppose you are using languages such as french, portuguese, spanish. On Windows NT you can use W functions and convert utf-8 to utf-16. On win98 you do *can* display the special characters, but you need to convert utf-8 to the proper iso. Other languages are also supported using different iso enconding. This way you can show at least one group of languages at a time on win98. This is how TNT Delphi Components work for example. On Windows 98 it shows only the characters available on the environment. --Sekelsenmat 18:25, 22 May 2006 (CEST)

No it doesnt,actually with use of *W api calls it doesnt too!Win98 doesnt support *W calls,it is supported with unicode interface layer ie unicows,which it also doesnt do anything special just convert utf16 to ansi and opposite if required.

We can do that new widgetset too,it maybe make the code also more clear and maybe we can have some special optimizations too! but somebody should make sure other stuff between these 2 interfaces become sync and also as i did almost the very same for wince interface i can tell code conversion is not more than 5% of all codes,so we are doing lots of code duplication which is not good!

Maybe haveing a common interface-also between win32a ,win32u and wince-and move those functions/code which need modification to another place. Roozbeh

The goal is to enable LCL to support Unicode on WinNT+, at the same time not breaking any existing code and not departing from Lazarus spirit. This means relying on UTF-8 in strings internally, on either UTF-8 or ISO-pages in application strings (ISOpages option for backward compatibility; I would prefere systemwide global variable to make possible for the developer to define that at application start) and on either *A or *W (with appropriate conversions UTF-8<-->ISO and UTF-8/ISO<-->UTF-16), depending on the machine on which the developed application runs.

However, we have to be aware of the performance penalty: it is not just about (resource) strings and string constants of the application GUI, but for instance of DB-aware components. In this respect the Tnt approach is much better. However, achieving a uniform Unicode-supporting LCL (and RTL) for all Lazarus target platforms would, IMHO, realy make the difference for Lazarus.

This means enabling the existing components to adapt their operation, based on the capabilities of the win system on which the application is running. This also means - we should not forget this - to Unicode-enable/clone/modify lots of non visual units (all file system communication in the first place - remamber that on WinNT something like this C:\äöüćčšбвгд\филе.txt is a valid path/file name!) too. One can find lots of such stuff in Tnt.

Roozbeh, as you have already found, unicows just enables the mapping (*W API calls to *A API calls, using a single target ISO page (no mixing of Cyrilic and Arabic) on old systems for applications using *W, but does not provide any Unicode capability on those systems.

Vincent, regarding the option win32u - I have thought of that myself, but am not sure. Roozbeh has made a good point, of which I agree in principle. Maybe cloning win32 to win32u and starting to work there would just be safer in the first time, as we learn ourselves and geather experience. Later we could merge, if it then will seem better. Just my views. Borut 16:16, 22 May 2006 (CEST)


I think the Directory property of the FileListBox should be a UTF8 string too. The FileListBox should translate it to an ansistring when passing this information to the RTL. IMHO The conversion is a responsibility of the LCL <-> RTL interface and not of the user of the TFileListBox. Vincent 12:33, 25 April 2008 (CEST)

See revision 14961 how this can be done. Vincent 12:52, 25 April 2008 (CEST)