|Summary:||sort generating incorrect sort order|
|Product:||[Retired] Red Hat Linux||Reporter:||Need Real Name <a.f.3>|
|Component:||textutils||Assignee:||Bernhard Rosenkraenzer <bero>|
|Status:||CLOSED NOTABUG||QA Contact:|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2000-09-20 16:48:40 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Need Real Name 2000-09-20 16:48:31 UTC
Using sort on my newsgroups generates the following order. (co.politics shouldn't be at the end): co.ads: 1-30011685 co.general: 1-30083370 co.jobs: 1-30314274 comp.home.automation: 1-30067541 comp.org.acm: 1-30000864 comp.os.linux.admin: 1-30008227 comp.os.linux.advocacy: 1-30284164 comp.os.linux.announce: 1-30008024 comp.os.linux.answers: 1-30002684 comp.os.linux.development: 1-30003740 comp.os.linux.hardware: 1-30149627 comp.os.linux.help: 1-30038282,30038297 comp.os.linux.misc: 1-30251481,30251490-30251491,30251496-30251497,30251551,30251557,30251562,30251577 comp.os.linux.networking: 1-30198051,30198159 comp.os.linux.questions: 1-30031078 comp.os.linux.redhat: 1-7299 comp.os.linux.setup: 1-30240083 comp.os.linux.x: 1-30095047 comp.os.linux.x.video: 1-30000128 comp.risks: 1-30000226 comp.sys.hp48: 1-30061373 comp.sys.ibm.pc.soundcard: 1-30002431 comp.sys.ibm.pc.soundcards: 1-30001688 comp.unix.admin: 1-30048211,30048254 comp.unix.advocacy: 1-30025765 comp.unix.dos-under-unix: 1-30000745 comp.unix.misc: 1-30016883 comp.unix.programmer: 1-30050456 comp.unix.questions: 1-30041951 comp.unix.shell: 1-30048270,30048328,30048335,30048418-30048420,30048422,30048432,30048450 comp.unix.unixware.announce: 1-30000022 comp.unix.user-friendly: 1-30001183 co.politics: 1-30022593,30022640,30022645,30022659
Comment 1 Bernhard Rosenkraenzer 2000-09-23 14:47:48 UTC
It's actually a feature (locale sorting). Use export LC_COLLATE="C" or whatever | LC_COLLATE="C" sort to get the old behavior.
Comment 2 Ed Avis 2000-10-16 09:56:16 UTC
Setting $LANG causes the behaviour to change. For example: % echo -e "a\nB" | LANG= sort B a % echo -e "a\nB" | LANG=en sort B a % echo -e "a\nB" | LANG=en_US sort a B Why should 'en' and 'en_US' be different? (The same problem applies to en_GB, en_AU; fr is okay but fr_FR is broken...) There are a huge number of shell scripts, Makefiles and other programs out there that expect sort to sort case-sensitively and in ASCII order. They probably don't expect that the sort order will change due to random environment variables. So I'd say that the default should be to keep the traditional sort order (at least when the input data is ASCII) unless the user specifically asks for something else.
Comment 3 Ed Avis 2000-10-17 11:14:28 UTC
The reason why 'en' is different to 'en_US' is that while the stuff in /usr/share/locale/ is part of glibc, locale/en/ is generated by the package 'kpilot'! I'm not sure why kpilot feels it needs a locale all to itself, but the package probably needs fixing. Similarly, some KDE packages have generated an en_UK/ directory when the correct name should be en_GB/. This is probably a bug in KDE. BTW, I strongly disagree with the resolution 'NOTABUG'. Breaking the behaviour of sort(1) is about as close to a major bug as you can get. Although I realize that POSIX mandates the brokenness. The sheer number of bug reports submitted about this should be an indication that sort is not doing the Right Thing. The manual page could be a lot clearer about what is going on - it mentions $LC_COLLATE but not $LANG. Other things in the manual page like '-f fold lower case to upper case' are also misleading if the sort order has already been modified by a locale.
Comment 4 Need Real Name 2000-11-28 15:56:59 UTC
Let's add a worse effect caused by the same default: using a locale-aware shell (such as bash 2), what happens with rm [A-Z]* 1) "en_US" locale settings: Removes every file whose name starts with a letter, except if the letter is a lowcase z. 2) "C" locale settings: Removes every file whose name starts with a capital letter. Yes, this still works as documented, but choosing default environment that works this much differently from what has been considered "normal behaviour" should have a big warning sticker taped on a visible place.