Red Hat Bugzilla – Bug 77214
locale collation problem with LC_COLLATE=*.UTF-8
Last modified: 2016-11-24 09:59:56 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux)
Description of problem:
When your locale is using UTF8 there is a problem
with collation. Try the following under a bash:
The output of echo is:
which is entirely wrong.
If you have LC_COLLATE=C in the environment
the bug does not occur.
This bug will cause shell scripts, at least, to
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run bash interactively
2 run "locale" to make sure "UTF8" appears in the
LC_COLLATE variable, e.g.
locale | grep LC_COLLATE
and the output is something like:
3. run the following commands:
the output of echo is:
which is wrong
Actual Results: the output of echo is:
which is wrong
Expected Results: the output of echo should be:
If you set the environment variable LC_LOCALE=C
the results of running the example is correct:
However, it must be set when the shell is started,
and then run the example from this new shell.
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings,
i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated
by POSIX, if you need the old behavior, export LC_COLLATE=C.
The only problem is that bash doesn't react immediately (needs LC_COLLATE=C
bash), and this is fixed in bash-2.05b-7 in rawhide.
I understand, thanks.
However, won't existing shell scripts break? Perhaps,
LC_COLLATE should be set in /etc/profile.d/glib2.*
to "C" to avoid this? (Or something like it.)
The reason this was spotted was because a friend
of mine noticed an existing shell script broke
under Redhat 8.0.
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which
requires the C collation. Say if you want to do an ASCII sort, you do
somecommand | LC_ALL=C sort
This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x
series (though in that case the locales weren't en_US.UTF-8 etc., but en_US