Bug 19942 - LANG=en_US (the default) appears to fold lowercase to uppercase before sorting
Summary: LANG=en_US (the default) appears to fold lowercase to uppercase before sorting
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 7.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Aaron Brown
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2000-10-27 21:02 UTC by Ronald Cole
Modified: 2016-11-24 15:23 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2000-10-27 22:56:02 UTC
Embargoed:


Attachments (Terms of Use)

Description Ronald Cole 2000-10-27 21:02:51 UTC
$ LANG= bash -c 'echo -e "a\nB" | sort'
B
a
$ LANG=en bash -c 'echo -e "a\nB" | sort'
B
a
$ LANG=en_US bash -c 'echo -e "a\nB" | sort'
a
B

!!!oops, that can't be right!!!   The same commands run on AIX and HPUX all
produce
identical output (as I would expect no difference between "en" and "en_US"
in this regard).
The GNU C Library Reference Manual says that "[d]efining and installing
named locales
[other than "C" or "POSIX" is normally a responsibility of ... the person
who installed the
GNU C library".  I guess that that would fall on Red Hat for installing a
broken locale and
making it the default.

Comment 1 Jakub Jelinek 2000-10-28 06:17:00 UTC
Actually, it is right. Open any printed vocabulary (be it English,
German, Norwegian or Czech) and see how entries are sorted.
The fact that sorting has been broken on most of the OSes
does not change anything on that. There is no such locale as en, so
it defaults to C, that's why the output of the first two is identical.
If you rely on ASCII sorting, use C locale, if you want native language
collation, use your own locale.
If AIX and HPUX don't fold cases, they are broken.
E.g. Solaris with en_US locale sorts the same way as RHL 7.0.

Comment 2 Ronald Cole 2000-10-28 22:13:16 UTC
Well, then the bug is RedHat defaulting to LANG=en_US.  It should probably 
default to either "C" or "POSIX" and the user should change it to "en_US" if 
that's what they want.  According the the GNU C Library Reference Manual, "C" 
and "POSIX" are the only ones that can be considered "portable" as all others
are obviously vendor supplied and therefore, extensions.

Comment 3 Ronald Cole 2000-10-28 23:15:59 UTC
I have entered bug #19973 against package "initscripts".


Note You need to log in before you can comment on or make changes to this bug.