Bug 17749 - sort generating incorrect sort order
sort generating incorrect sort order
Status: CLOSED NOTABUG
Product: Red Hat Linux
Classification: Retired
Component: textutils (Show other bugs)
6.2
i386 Linux
medium Severity low
: ---
: ---
Assigned To: Bernhard Rosenkraenzer
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2000-09-20 12:48 EDT by Need Real Name
Modified: 2008-05-01 11:37 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2000-09-20 12:48:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Need Real Name 2000-09-20 12:48:31 EDT
Using sort on my newsgroups generates the following order.
(co.politics shouldn't be at the end):
co.ads: 1-30011685
co.general: 1-30083370
co.jobs: 1-30314274
comp.home.automation: 1-30067541
comp.org.acm: 1-30000864
comp.os.linux.admin: 1-30008227
comp.os.linux.advocacy: 1-30284164
comp.os.linux.announce: 1-30008024
comp.os.linux.answers: 1-30002684
comp.os.linux.development: 1-30003740
comp.os.linux.hardware: 1-30149627
comp.os.linux.help: 1-30038282,30038297
comp.os.linux.misc:
1-30251481,30251490-30251491,30251496-30251497,30251551,30251557,30251562,30251577
comp.os.linux.networking: 1-30198051,30198159
comp.os.linux.questions: 1-30031078
comp.os.linux.redhat: 1-7299
comp.os.linux.setup: 1-30240083
comp.os.linux.x: 1-30095047
comp.os.linux.x.video: 1-30000128
comp.risks: 1-30000226
comp.sys.hp48: 1-30061373
comp.sys.ibm.pc.soundcard: 1-30002431
comp.sys.ibm.pc.soundcards: 1-30001688
comp.unix.admin: 1-30048211,30048254
comp.unix.advocacy: 1-30025765
comp.unix.dos-under-unix: 1-30000745
comp.unix.misc: 1-30016883
comp.unix.programmer: 1-30050456
comp.unix.questions: 1-30041951
comp.unix.shell:
1-30048270,30048328,30048335,30048418-30048420,30048422,30048432,30048450
comp.unix.unixware.announce: 1-30000022
comp.unix.user-friendly: 1-30001183
co.politics: 1-30022593,30022640,30022645,30022659
Comment 1 Bernhard Rosenkraenzer 2000-09-23 10:47:48 EDT
It's actually a feature (locale sorting).
Use

export LC_COLLATE="C"
or
whatever | LC_COLLATE="C" sort

to get the old behavior.
Comment 2 Ed Avis 2000-10-16 05:56:16 EDT
Setting $LANG causes the behaviour to change.  For example:

% echo -e "a\nB" | LANG= sort
B
a
% echo -e "a\nB" | LANG=en sort
B
a
% echo -e "a\nB" | LANG=en_US sort
a
B

Why should 'en' and 'en_US' be different?  (The same problem applies to en_GB,
en_AU; fr is okay but fr_FR is broken...)

There are a huge number of shell scripts, Makefiles and other programs out there
that expect sort to sort case-sensitively and in ASCII order.  They probably
don't expect that the sort order will change due to random environment
variables.  So I'd say that the default should be to keep the traditional sort
order (at least when the input data is ASCII) unless the user specifically asks
for something else.
Comment 3 Ed Avis 2000-10-17 07:14:28 EDT
The reason why 'en' is different to 'en_US' is that while the stuff in
/usr/share/locale/ is part of glibc, locale/en/ is generated by the package
'kpilot'!  I'm not sure why kpilot feels it needs a locale all to itself, but
the package probably needs fixing.  Similarly, some KDE packages have generated
an en_UK/ directory when the correct name should be en_GB/.  This is probably a
bug in KDE.

BTW, I strongly disagree with the resolution 'NOTABUG'.  Breaking the behaviour
of sort(1) is about as close to a major bug as you can get.  Although I realize
that POSIX mandates the brokenness.  The sheer number of bug reports submitted
about this should be an indication that sort is not doing the Right Thing.

The manual page could be a lot clearer about what is going on - it mentions
$LC_COLLATE but not $LANG.  Other things in the manual page like '-f fold lower
case to upper case' are also misleading if the sort order has already been
modified by a locale.
Comment 4 Need Real Name 2000-11-28 10:56:59 EST
Let's add a worse effect caused by the same default:
using a locale-aware shell (such as bash 2), what happens with
rm [A-Z]*

1) "en_US" locale settings:
Removes every file whose name starts with a letter, except if the letter is a lowcase z.

2) "C" locale settings:
Removes every file whose name starts with a capital letter.


Yes, this still works as documented, but choosing default environment that works
this much differently from what has been considered "normal behaviour" should
have a big warning sticker taped on a visible place.

Note You need to log in before you can comment on or make changes to this bug.