Difference between revisions of "Character and string types/es"
Line 15: | Line 15: | ||
* [http://www.freepascal.org/docs-html/ref/refsu7.html Documentación FPC sobre AnsiChar] | * [http://www.freepascal.org/docs-html/ref/refsu7.html Documentación FPC sobre AnsiChar] | ||
− | * [[Char| | + | * [[Char|Uso de Char]] |
== WideChar == | == WideChar == |
Revision as of 15:18, 22 March 2016
│
Deutsch (de) │
English (en) │
español (es) │
français (fr) │
русский (ru) │
中文(中国大陆) (zh_CN) │
Free Pascal soporta varios tipos de character and string. Estos van desde simples caracteres ANSI a cadenas UNICODE y además incluye los tipos puntero. Las diferencias también se aplican a las codificaciones y reference counting.
AnsiChar
Un tipo de variable AnsiChar, también referenciado como char, es exactamente de 1 byte de tamaño, y contiene un carácter ANSI.
a |
Referencia
WideChar
Una variable de tipo WideChar, también referenciada como UnicodeChar, que es exactamente 2 bytes de tamaño, and contains one (part of) Unicode character in UTF-16 encoding. Nota: es imposible codificar todos los puntos codificables Unicode con 2 bytes. Therefore, 2 WideChars may be needed to encode a single code point.
a |
Referencias
Array of Char
Early Pascal implementations that were in use before 1978 did not support a string type (with the exception of string constants). The only possibility to store strings in variables was the use of arrays of char. This approach has many disadvantages and is no longer recommended. It is, however, still supported to ensure backward-compatibility with ancient code.
Static Array of Char
type
TOldString4 = array[0..3] of char;
var
aOldString4: TOldString4;
begin
aOldString4[0] := 'a';
aOldString4[1] := 'b';
aOldString4[2] := 'c';
aOldString4[3] := 'd';
end;
The static array of char has now the content:
a | b | c | d |
Matriz dinámica de Char
var
aOldString: Array of Char;
begin
SetLength(aOldString, 5);
aOldString[0] := 'a';
aOldString[1] := 'b';
aOldString[2] := 'c';
aOldString[3] := 'd';
end;
The dynamic array of char has now the content:
a | b | c | d | #0 |
PChar
Una variable de tipo PChar es básicamente un puntero a un tipo Char, pero que permite operaciones adicionales. Pchars se pueden utilizar para acceder al estilo C null-terminated strings, e.g. en interacciones con ciertas librerías del Sistema Operativo o en software de terceros.
a | b | c | #0 |
^ |
Referencia
PWideChar
A variable of type PWideChar is a pointer to a WideChar variable.
a | b | c | #0 | #0 | |||
^ |
Texto de encabezado
Referencia
String
The type String may refer to ShortString or AnsiString, depending from the {$H} switch. If the switch is off ({$H-}) then any string declaration will define a ShortString. It size will be 255 chars, if not otherwise specified. If it is on ({$H+}) string without length specifier will define an AnsiString, otherwise a ShortString with specified length. In mode delphiunicode' String is UnicodeString.
Reference
ShortString
Short strings have a maximum length of 255 characters with the implicit codepage CP_ACP. The length is stored in the character at index 0.
#3 | a | b | c |
Reference
AnsiString
Ansistrings are strings that have no length limit. They are reference counted and are guaranteed to be null terminated. Internally, a variable of type AnsiString is treated as a pointer: the actual content of the string is stored on the heap, as much memory as needed to store the string content is allocated.
a | b | c | #0 | ||||||||
RefCount | Length |
Reference
UnicodeString
Like AnsiStrings, UnicodeStrings are reference counted, null-terminated arrays, but they are implemented as arrays of WideChars instead of regular Chars.
a | b | c | #0 | #0 | |||||||||||
RefCount | Length |
Reference
UTF8String
In FPC 2.6.5 and below the type UTF8String was an alias to the type AnsiString. In FPC 2.7.1 and above it is defined as
UTF8String = type AnsiString(CP_UTF8);
It is meant to contain UTF-8 encoded strings (i.e. unicode data) ranging from 1..4 bytes per character. Note that String can also contain UTF-8 encoded characters.
Reference
UTF16String
The type UTF16String is an alias to the type WideString. In the LCL unit lclproc it is an alias to UnicodeString.
Reference
WideString
Variables of type WideString (used to represent unicode character strings in COM applications) resemble those of type UnicodeString, but unlike them they are not reference-counted. On Windows they are allocated with a special windows function which allows them to be used for OLE automation.
WideStrings consist of COM compatible UTF16 encoded bytes on Windows machines (UCS2 on Windows 2000), and they are encoded as plain UTF16 on Linux, Mac OS X and iOS.
a | b | c | #0 | #0 | |||||||
Length |
Reference
PShortString
A variable of type PShortString is a pointer that points to the first byte of a ShortString-type variable (which defines the length of the ShortString).
#3 | a | b | c |
^ |
Reference
PAnsiString
Variables of type PAnsiString are pointers to AnsiString-type variables. However, unlike PShortString-type variables they don't point to the first byte of the header, but to the first char of the AnsiString.
a | b | c | #0 | ||||||||
RefCount | Length | ^ |
Reference
PUnicodeString
Variables of type PUnicodeString are pointers to variables of type UnicodeString.
a | b | c | #0 | #0 | |||||||||||
RefCount | Length | ^ |
Reference
PWideString
Variables of type PWideString are pointers. They point to the first char of a WideString-typed variable.
a | b | c | #0 | #0 | |||||||
Length | ^ |
Reference
String constants
If you use only English constants your strings work the same with all types, on all platforms and all compiler versions. Non English strings can be loaded via resourcestrings or from files. If you want to use non English strings in code then you should read further.
There are various encodings for non English strings. By default Lazarus saves Pascal files as UTF-8 without BOM. UTF-8 supports the full Unicode range. That means all string constants are stored in UTF-8. Lazarus also supports to change the encoding of a file to other encoding, for example under Windows your local codepage. The Windows codepage is limited to your current language group.
String Type, UTF-8 Source | With or without {$codepage utf8} | FPC 2.6.5 and below | FPC 2.7.1 and above | FPC 2.7.1+ with UTF8 as default CodePage |
---|---|---|---|---|
AnAnsiString:='ãü'; | Without | Needs UTF8ToAnsi in RTL/WinAPI. Ok in LCL | Needs UTF8ToAnsi in RTL/WinAPI. Ok in LCL | Ok in RTL/W-WinAPI/LCL. Needs UTF8ToWinCP in A-WinAPI. |
AnAnsiString:='ãü'; | With | System cp ok in RTL/WinAPI. Needs SysToUTF8 in LCL | Ok in RTL/WinAPI/LCL. Mixing with other strings converts to system cp | Ok in RTL/W-WinAPI/LCL. Needs UTF8ToWinCP in A-WinAPI |
AnUnicodeString:='ãü'; | Without | Wrong everywhere | Wrong everywhere | Wrong everywhere |
AnUnicodeString:='ãü'; | With | System cp ok in RTL/WinAPI. Needs UTF8Encode in LCL | Ok in RTL/WinAPI/LCL. Mixing with other strings converts to system cp | Ok in RTL/W-WinAPI/LCL. Needs UTF8ToWinCP in A-WinAPI |
AnUTF8String:='ãü'; | Without | Same as AnsiString | Wrong everywhere | Wrong everywhere |
AnUTF8String:='ãü'; | With | Same as AnsiString | Ok in RTL/WinAPI/LCL. Mixing with other strings converts to system cp | Ok in RTL/W-WinAPI/LCL. Needs UTF8ToWinCP in A-WinAPI. |
- W-WinAPI = Windows API "W" functions, UTF-16
- A-WinAPI = Windows API non "W" functions, 8bit system code page
- System CP = The 8bit system code page of the OS. For example code page 1252.
const
c='ãü';
cstring: string = 'ãü'; // see AnAnsiString:='ãü';
var
s: string;
u: UnicodeString;
begin
s:=c; // same as s:='ãü';
s:=cstring; // does not change encoding
u:=c; // same as u:='ãü';
u:=cstring; // fpc 2.6.1: converts from system cp to UTF-16, fpc 2.7.1+: depends on encoding of cstring
end;