Bug 77214 - locale collation problem with LC_COLLATE=*.UTF-8
Summary: locale collation problem with LC_COLLATE=*.UTF-8
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 8.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-11-03 12:20 UTC by jar
Modified: 2016-11-24 14:59 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-11-03 13:38:47 UTC
Embargoed:


Attachments (Terms of Use)

Description jar 2002-11-03 12:20:02 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3; Linux)

Description of problem:
When your locale is using UTF8 there is a problem
with collation.  Try the following under a bash:

	mkdir c
	cd c
	> a
	> b
	echo [A-Z]

The output of echo is:
	b
which is entirely wrong.

If you have LC_COLLATE=C in the environment
the bug does not occur.

This bug will cause shell scripts, at least, to
behave incorrectly.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. run bash interactively
2  run "locale" to make sure "UTF8" appears in the
   LC_COLLATE variable, e.g.
	locale | grep LC_COLLATE
   and the output is something like:
	LC_COLLATE=en_US.UTF-8
3. run the following commands:
	mkdir c
	cd c
	> a
	> b
	echo [A-Z]	
the output of echo is:
	b
which is wrong
	

Actual Results:  the output of echo is:
	b
which is wrong

Expected Results:  the output of echo should be:
	[A-Z]

Additional info:

If you set the environment variable LC_LOCALE=C
the results of running the example is correct:
	[A-Z]
However, it must be set when the shell is started,
so do
	LC_LOCALE=C bash
and then run the example from this new shell.

Comment 1 Miloslav Trmac 2002-11-03 13:17:46 UTC
Range expressions (i.e. [A-Z]) must behave according to LC_COLLATE settings,
i.e. use dictionary order (AaBbCc or aAbBcC in most locales). This is mandated
by POSIX, if you need the old behavior, export LC_COLLATE=C.

The only problem is that bash doesn't react immediately (needs LC_COLLATE=C 
bash), and this is fixed in bash-2.05b-7 in rawhide.

Comment 2 jar 2002-11-03 13:38:40 UTC
I understand, thanks. 
 
However, won't existing shell scripts break?  Perhaps, 
LC_COLLATE should be set in /etc/profile.d/glib2.* 
to "C" to avoid this?  (Or something like it.) 
 
The reason this was spotted was because a friend 
of mine noticed an existing shell script broke 
under Redhat 8.0. 


Comment 3 Jakub Jelinek 2002-11-03 15:05:27 UTC
You should set LC_COLLATE=C (or LC_ALL=C) right before invoking program which
requires the C collation. Say if you want to do an ASCII sort, you do
somecommand | LC_ALL=C sort
This is nothing new in 8.0 - exactly the same behaviour was there in the 7.x
series (though in that case the locales weren't en_US.UTF-8 etc., but en_US
or en_US.ISO-8859-15).


Note You need to log in before you can comment on or make changes to this bug.