Getting translation strings right

From Lazarus wiki
Revision as of 17:53, 20 March 2005 by Tom at work (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This page contains some basic notes on getting translation strings right from the start, from the original writers (e.g. most often programmers) angle.

"To get it right" in this context means that the translation strings have been prepared properly for easier further translation, and the original version adapted to the basic requirements for using them.

Although it is tried to be as language neutral as possible, there is in fact a slight bias towards English in particular. Feel free to extend or discard the ones which do not apply to your situation (and maybe add them to this page?).


Take care to choose appropriate strings for a given situation.

To avoid problems, simply make sure that the original strings are okay in the first place - there are several reasons to do so:

  • the original strings are usually used as default translation, making a bad impression to the end user who happens to use the default strings.
  • even worse making the translator's work unnecessarily harder, who is responsible for conveying the original information to another language. Just remember the "Garbage In, Garbage Out" principle which perfectly applies here.

So try to make sure that your spelling is correct - use a dictionary if you are unsure.

Use understandable and well-known phrases for a given situation you want to describe, in a consistent manner. If you are unsure whether something is common, try to put other programs in the same situation and examine their responses. Literature and help files are also often a good resource for the exact special terms or phrases. Try to be consistent in choosing phrases too. For example the questions

Delete the file?
Erase the file?
Remove the file?
Wipe the file?
Zap the file?

all have a somewhat similar meaning, but when used interchangably for no apparent reason the one or other reader may start interpreting weird things into it. Especially translators are very prone to this error, since they often do not know the exact context of a particular message (e.g. information about the origin of the message) and may interpret simple word variations as indication of important differences, and will likely be tempted to choose uncommon (bad) translations.

Especially for error messages to the user: try to describe the problem itself in appropriate words. This is never the state of the program which led to the error message. This is likely useful for the person debugging the program, but not necessarily for the user. Users will either shrug their shoulders and ignore it in the best case, or choose another program in the worst case - only because the program was not able to give a proper problem description.

Give (easily) understandable descriptions. Do not try to impress your audience with foreign or very technical words only for telling that the current work has not yet been saved if not really necessary.

Especially try to avoid multiple negations within a single sentence in your wordings, they are nearly always harder to read than their non-negated counterparts. An example could be

This component can not be dropped on non-TControls.

which, in its non-negated form

This component can only be dropped on TControls.

is certainly easier to read and understand.

Technical issues

Know existing possibilities

In this section a short overview of technical points, basically an overview of existing possibilites are given. These include the resourcestring construct, GNU gettext() and the format() function.

Resourcestrings, and GNU gettext

Free Pascal has some built-in language constructs for providing a way of handling constant strings. There is a special section of a unit called "resourcestring" which was specifically introduced for this reason. See the appropriate section of the FPC manual (prog.pdf, pg. 89ff or here) for the details.

GNU gettext is a special set of utilities to provide translations for your programs, see the FPC manual once more (prog.pdf, pg. 91ff or here).

Note: GNU gettext has a conceptual flaw which does not allow mapping of a single original string to multiple translated strings, be aware of that.

The format() function

To not only allow completely static strings in translations, you can use the format() method of the sysutils unit. It is able to replace placeholders within a given text by their actual value given as secondary parameter in a set. Example:

format('Tomorrow on %0:s there will be %1:s.', ['Sunday', 'sunshine']);


Tomorrow on Sunday there will be sunshine.

In this case the %0:s is a placeholder for the first (index 0) argument in the set of actual values (Sunshine), and likewise %1:s. For an exact definition of the allowed placeholder and their syntax see the FPC reference manual.

Some guidelines for the usage of the format() function

  • Try to use indexed placeholders even in the original strings, although they are optional. When used, they allow the translator to move the arguments easily within the sentence allowing him more natural expressions (and actually sometimes it is required to create a proper sentence).
  • Never compose a sentence out of more than one string. Always use the format() method from the sysutils unit to construct the correct string using placeholders during runtime. Translators will usually not be able to reconstruct the whole sentence; only consider that there are often hundreds of such strings within a single translation database...
  • Do not format using whitespaces. Simply move your label to the appropriate position in the first place. There may be problems with font changes, and seemingly superfluous spaces will be in danger of being trimmed by the translator.

Note: Since format() does not interpret escaped control characters (e.g. like C's "\n", "\t", etc) and GNU gettext for any reason being the translation system of choice (and the tools based on it not being able to interpret non-escaped control characters), it is required to programmatically insert linebreaks.

English related

This section contains notes which particularly apply when using English as the base language for translations.

  • Make sure to reserve enough space where the text is output: English is a language in which texts are almost always shorter (in characters) than their respective translations, so plan ahead by reserving enough space. Experience shows that for very short strings of a few characters length often almost double in size; this difference decreases as the strings get longer.
  • Avoid abbreviations in English; in addition to the fact that this shortens the already short strings even more, there are severe problems with e.g. languages that use ideographic characters where these abbreviations simply do not exist at all.
  • Since this is often an issue: In English punctuation marks (full stop, comma, ...) are part of the previous words, or form some sort of words themselves if there is no previous word (in case of an enumeration). Especially after a semicolon there should always be a trailing space.
There was an error ! Please check file settings , compiler settings,... to fix this issue.

has horrible punctuation and therefore simply looks bad and is harder than usual to read. Consider that common line break algorithms break the line on whitespaces, possibly resulting in a single stop at the beginning of a line...

There was an error! Please check file settings, compiler settings, ... to fix this issue.

would probably be okay only considering punctuation.