Internationalization (commonly abbreviated i18n
) is a topic which covers many
areas: more than just translating UI strings, it involves changing settings and
defaults to match the customs and conventions of the locale a program is being
run in. For example, days of the week, human name formats, currencies, etc.
Summary
- Design projects to be internationalized from the beginning.
- Use gettext (not intltool) for string translation.
- Remember that all strings are in UTF-8, and may contain multi-byte characters.
- Programs cannot reasonably implement changing locales at runtime.
Basics
Documenting the whole process of preparing a project for internationalisation is beyond the scope of this document, but some good guides exist:
- GNOME developer translation guidelines
- gtkmm translation guidelines (aimed at C++ programmers, but widely applicable to C programmers)
- GLib internationalization API reference
It is important to prepare a project for internationalization early in its lifetime, otherwise non-internationalizable programming practices creep in, and are hard to eliminate. For example, splitting strings into multiple translation units.
To add internationalization support to a project, follow the
instructions here,
which can be summarised as adding the following to configure.ac
:
AM_GNU_GETTEXT_VERSION([0.19])
AM_GNU_GETTEXT([external])
GETTEXT_PACKAGE=AC_PACKAGE_TARNAME
AC_DEFINE_UNQUOTED(GETTEXT_PACKAGE, ["$GETTEXT_PACKAGE"], [Define to the Gettext package name])
AC_SUBST(GETTEXT_PACKAGE)
Note that intltool is outdated, and we only need to use gettext.
Add po/Makefile.in
to AC_CONFIG_FILES
and po
to SUBDIRS
in
Makefile.am
. Then create an empty po/POTFILES.in
file (which will be
modified when files are marked for translation), an empty po/LINGUAS
file (which will be modified when extra translation languages are
added), and create po/Makevars
containing:
DOMAIN = $(PACKAGE)-$(VERSION)
COPYRIGHT_HOLDER =
MSGID_BUGS_ADDRESS =
EXTRA_LOCALE_CATEGORIES =
PO_DEPENDS_ON_POT = no
XGETTEXT_OPTIONS = \
--from-code=UTF-8 \
--keyword=_ --flag=_:1:pass-c-format \
--keyword=N_ --flag=N_:1:pass-c-format \
--flag=g_log:3:c-format --flag=g_logv:3:c-format \
--flag=g_error:1:c-format --flag=g_message:1:c-format \
--flag=g_critical:1:c-format --flag=g_warning:1:c-format \
--flag=g_print:1:c-format \
--flag=g_printerr:1:c-format \
--flag=g_strdup_printf:1:c-format --flag=g_strdup_vprintf:1:c-format \
--flag=g_printf_string_upper_bound:1:c-format \
--flag=g_snprintf:3:c-format --flag=g_vsnprintf:3:c-format \
--flag=g_string_sprintf:2:c-format \
--flag=g_string_sprintfa:2:c-format \
--flag=g_scanner_error:2:c-format \
--flag=g_scanner_warn:2:c-format
subdir = po
top_builddir = ..
These should be committed to git.
No other translation infrastructure files should be committed to git, especially not the following. See the module setup guidelines for more information.
po/ChangeLog
po/Makefile.in.in
po/POTFILES
po/stamp-it
po/*.mo
Unicode
All strings in GLib, unless otherwise specified, are in Unicode, encoded as UTF-8. They must be handled as such, which means all string manipulation must be done in terms of Unicode characters, rather than bytes. In many cases, string manipulation functions do not need to differentiate between the two; manual array indexing is a situation where you should be careful.
GLib provides a set of UTF-8-safe versions of standard C string manipulation functions, which should always be used instead of the standard C ones.
Sorting strings
When displaying sorted strings in the UI, care needs to be taken to ensure the
strings are sorted using Unicode algorithms, rather than plain ASCII
algorithms. This means using
g_utf8_collate()
rather than strcmp()
to establish an
order between two strings.
Furthermore, if section headings need to be used for splitting a list into
alphabetical sections, they need to be generated using the
current locale’s alphabet,
rather than just the A–Z
English alphabet. One
approach to doing this would be to extract the first character of each item’s
name (using
g_utf8_get_char_validated()
)
then using it as a section heading if it’s considered alphabetic for the
current locale (using
g_unichar_isalpha()
).
Changing locale
Changing locale at runtime is not safe, as it requires calling
setenv()
,
which is explicitly not thread safe. It also theoretically involves more than
just changing UI strings — it involves changing date formats, number formats,
and the output of any code which is predicated on those. The impacts of
changing locale can be far-reaching and subtle.
To change the locale of an application, the application has to be restarted.
Language identifiers
When referring to languages (e.g. in configuration files or preferences), always use the ISO-639 language codes, as used by gettext.
External links
- FAQ on Unicode and UTF-8
- Comprehensive introduction to Unicode
- Technical principles of Unicode
- GNOME developer translation guidelines
- gtkmm translation guidelines (aimed at C++ programmers, but widely applicable to C programmers)
- GLib internationalization API reference
- GLib Unicode API reference