Bug 77214 - locale collation problem with LC_COLLATE=*.UTF-8
locale collation problem with LC_COLLATE=*.UTF-8
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: glibc (Show other bugs)
8.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Jakub Jelinek
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-11-03 07:20 EST by jar
Modified: 2016-11-24 09:59 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-11-03 08:38:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description jar 2002-11-03 07:20:02 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux)

Description of problem:
When your locale is using UTF8 there is a problem
with collation.  Try the following under a bash:

	mkdir c
	cd c
	> a
	> b
	echo [A-Z]

The output of echo is:
	b
which is entirely wrong.

If you have LC_COLLATE=C in the environment
the bug does not occur.

This bug will cause shell scripts, at least, to
behave incorrectly.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. run bash interactively
2  run "locale" to make sure "UTF8" appears in the
   LC_COLLATE variable, e.g.
	locale | grep LC_COLLATE
   and the output is something like:
	LC_COLLATE=en_US.UTF-8
3. run the following commands:
	mkdir c
	cd c
	> a
	> b
	echo [A-Z]	
the output of echo is:
	b
which is wrong
	

Actual Results:  the output of echo is:
	b
which is wrong

Expected Results:  the output of echo should be:
	[A-Z]

Additional info:

If you set the environment variable LC_LOCALE=C
the results of running the example is correct:
	[A-Z]
However, it must be set when the shell is started,
so do
	LC_LOCALE=C bash
and then run the example from this new shell.
Comment 1 Miloslav Trmac 2002-11-03 08:17:46 EST
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings,
i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated
by POSIX, if you need the old behavior, export LC_COLLATE=C.

The only problem is that bash doesn't react immediately (needs LC_COLLATE=C 
bash), and this is fixed in bash-2.05b-7 in rawhide.
Comment 2 jar 2002-11-03 08:38:40 EST
I understand, thanks. 
 
However, won't existing shell scripts break?  Perhaps, 
LC_COLLATE should be set in /etc/profile.d/glib2.* 
to "C" to avoid this?  (Or something like it.) 
 
The reason this was spotted was because a friend 
of mine noticed an existing shell script broke 
under Redhat 8.0. 
Comment 3 Jakub Jelinek 2002-11-03 10:05:27 EST
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which
requires the C collation. Say if you want to do an ASCII sort, you do
somecommand | LC_ALL=C sort
This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x
series (though in that case the locales weren't en_US.UTF-8 etc., but en_US
or en_US.ISO-8859-15).

Note You need to log in before you can comment on or make changes to this bug.