Red Hat Bugzilla – Bug 43564
/bin/sort is sorting by case-folded alphabetic order!
Last modified: 2005-10-31 17:00:50 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2 i686)
Description of problem:
/bin/sort from the textutils-2.0.11-7 RPM without any options sorts
according to the alphabetic order of only the alphabetic characters in the
lines given to it. This is terrible! /bin/sort is a basic and essential
piece of Unix plumbing. I don't feel safe using any Unix which doesn't
have a properly working /bin/sort installed.
Steps to Reproduce:
1. Pipe the following text into /bin/sort (with no options)
Actual Results: /foobaa
Expected Results: /foo/Baz
I'm putting this down as "high" severity, because somebody could lose their
job for recommending Red Hat Linux with a basic utility bug like this.
Shell scripts which use sort are silently screwing up data all around
the world as we sit here.
Okay, so this is caused by LC_ALL not being set to POSIX in .bashrc, but that's
still a severe bug.
Read through http://mail.gnu.org/pipermail/bug-textutils/ to see the bullshit
that various GNU volunteers have had to deal with because of this bug (going
back to at least October 1999). I'm sure it's just an honest mistake, but it's
hard not to get angry at Red Hat about something like this, especially given
that the bug has existed for such a long time.
There is no point in getting angry. This is not a mistake or bug, the sort
works like advertised - see the (texinfo) docs:
Unless otherwise specified, all comparisons use the character
collating sequence specified by the `LC_COLLATE' locale.
The way the sorting works is defined by locale. Being the author of the slovak
locale in glibc (with the collating part copied from the czech one) I know
these issues quite well. We use one of the fancier locales and our sorting
standard is actually unimplementable (it even requires the knowledge of the
Sort is _text_ sorting utility and it should sort exactly how the locale
prescribes. Shell scripts that are screwing data because of this are broken and
the collating order is the lesser problem - e.g. not resetting LC_NUMERIC or
grepping for some strings in output can be even worse. There are many hidden
gotchas like this - e.g. [A-Z]* will match foo in some locales.
If you don't like it,
echo "LC_COLLATE=C" >>/etc/sysconfig/i18n