Page MenuHomePhorge

Understand how to defuse localization in GNU Gettext
Closed, ResolvedPublic

Description

Imagine a wild amount of incoming environment variables like LANG, LANGUAGE, LC_MESSAGES and so on.

How to defuse localization?

๐Ÿ”ด LANGUAGE=CEven if this has maximum priority, this is too risky since other specific things may call LC_stuff as far as I understand.
๐Ÿ”ด LANG=CThis is supposed to be the default for missing LC_stuff so, not correct. Moreover, LC_ALL can be available and will take precedence - https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html
๐Ÿ”ถ LANG=C + LC_MESSAGES=Cit should work since it was adopted by https://github.com/apache/subversion/blob/trunk/tools/client-side/bash_completion#L81 but LC_ALL may be available and could take precedence, or, LC_NUMERIC and other stuff will still be read
๐Ÿ”ถ unset LC_ALL + LC_MESSAGES=COK for parsing messages, and in theory LANG has lower precedence, but not OK for parsing numbers and times and other stuff by LC_*
โœ… LC_ALL=Cprobably bombproof - in theory LC_ALL has highest precedence and the variable LANGUAGE is ignored if the locale is set to โ€˜Cโ€™. https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html#The-LANGUAGE-variable
โœ… unset LANGUAGE + LC_ALL=Cprobably bombproof - and in theory thanks to LC_ALL=C the LANG is not used to get defaults of LC_stuff including LC_MESSAGES
โœ…โœ… LC_ALL=C + LANG=Cprobably super bombproof, probably overkill, probably it's already OK the LC_ALL=C
IMPORTANT: If something above is uncorrect, please edit, add links, thanks ๐ŸŒˆ

Event Timeline

LC_ALL=C is not risky on GNU systems.

In Gettext, guess_category_value() ignores LANGUAGE if the locale is C or starts with C.. The locale is retrieved from Gnulib, which either calls uselocale(), setlocale(), or reads the environment variable with Gnulib's own implementation.

In Subversion, svn_cmdline_init() set the locale with setlocale(LC_ALL, ""), which means both uselocale() and setlocale() would read the locale set by setlocale(3). Gnulib's own implementation follows the GNU C Library's precedence rules for setlocale(3), which is:

  1. first (regardless of category), the environment variable LC_ALL is inspected
  2. next the environment variable with the same name as the category (see the table above), and
  3. finally the environment variable LANG.
  4. The first existing environment variable is used. If its value is not a valid locale specification, the locale is unchanged, and setlocale() returns NULL.

The details are implementation-dependent, but the Base Definitions volume of POSIX.1-2017 defines precedence as follows:

LC_ALL

This variable shall determine the values for all locale categories. The value of the LC_ALL environment variable has precedence over any of the other environment variables starting with LC_ (LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME) and the LANG environment variable.

I have not checked older standards or implementations, but for current POSIX-compliant and GNU systems, LC_ALL=C is certainly enough.

References:

OK thanks! you persuaded me

So, if we are sure that LC_ALL=C also ignores LANGUAGE, feel free to claim this and set as resolved to inherit endless glory and fix this truth on a runic stone

This [LANG=C] is supposed to be the default for missing LC_stuff

in theory LANG is never read since it's never used for defaults of LC_stuff including LC_MESSAGES

@valerio.bozzolan You may want to fix the contradiction between these two statements.

Anyways, LC_ALL=C is enough and LC_ALL=C + LANG=C should be bombproof.

About this:

unset LC_ALL + LC_MESSAGES=C

I think it's not safe since other LC_STUFF can still be read ๐Ÿค”

About this:

unset LC_ALL + LC_MESSAGES=C

I think it's not safe since other LC_STUFF can still be read ๐Ÿค”

Depends on what we are parsing, but yes that should be marked ๐Ÿ”ถ.

valerio.bozzolan assigned this task to โ€ข l2dy.

OK. Well, thaanks \o/