Red Hat Bugzilla – Full Text Bug Listing
|Summary:||sort man page needs to be updated to reflect i18n|
|Product:||[Fedora] Fedora||Reporter:||Harold Kornylak <kornylak>|
|Component:||coreutils||Assignee:||Ondrej Vasik <ovasik>|
|Status:||CLOSED NOTABUG||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-01-17 11:13:30 EST||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Harold Kornylak 2008-01-12 01:21:35 EST
Description of problem: SORT behaves as if the -b and -d options are always set. Sort ignores all leading blanks and special characters Version-Release number of selected component (if applicable): May 25 2005 compile date of /bin/sort How reproducible: Always Steps to Reproduce: 1.create a file with some lines having leading space or leading _ 2.sort 3. Actual results: sort order ignores leading spaces Expected results: lines with leading space or _ should sort to top unless -b and -d set Additional info:
Comment 1 Matthew Miller 2008-01-12 14:13:56 EST
Fedora Core 4 is no longer supported, but I'd like to make sure we get this problem solved for you. Can you try on a more recent release?
Comment 2 Harold Kornylak 2008-01-14 10:21:02 EST
I will post this to FC 8: I have verified on Fedora 6 and 8 the same problem: Sort is set up to behave as if the -b and -d options are set, namely that if a blank or a special character are in column one of a specified field, then sort looks ahead for the first alphanumeric character and uses that to begin the key. This is not in accordance with the man pages, and I need to be able to sort on the field I specified for my programs to work correctly. How do I unseclect this behavior? Specifically, I want column one to be ordered all on it's own, and only then to look at column two, unless I ask for -b or -d. So far the only workaround I found is to sort each column as a separate field such as -k 1.1,1.1 -k 1.2,1.2 etc Now sort results in: a a a _a A aa Z _Z I want a a _a _Z a aa A Z
Comment 3 Matthew Miller 2008-01-14 10:46:16 EST
I see what's going on. This is documented in the info page (type 'info sort') but not in the man page. The sort program (as all of the coreutils) are internationalized, which means they respect LANG and LC_* variables. The relevant part here is: Unless otherwise specified, all comparisons use the character collating sequence specified by the `LC_COLLATE' locale. The traditional Unix behavior is provided by the special "C" locale. Try running LC_COLLATE=C sort filetosort.txt or more generally LANG=C sort filetosort.txt You can of course set these environment variables in your startup scripts, or globally by editing /etc/sysconfig/i18n. (Note that this is standard behavior on proprietary Unix too, although many people haven't enabled internationalization.)
Comment 4 Matthew Miller 2008-01-14 10:46:49 EST
Reopening, because it wouldn't hurt to fix the man page.
Comment 5 Ondrej Vasik 2008-01-17 11:13:30 EST
On man page of sort you can see written: *** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values. and The full documentation for sort is maintained as a Texinfo manual. If the info and sort programs are properly installed at your site, the command info sort should give you access to the complete manual. This should be usually enough... anyway - doing more complex manpage will not get accepted by upstream (as I think it is complex enough in that case) and I think that it is not good idea to keep such patches only on RedHat side. Problem was explained via email and in duplicate bugzilla #428679. Closing NOTABUG for me...