Bug 428494

Summary: sort man page needs to be updated to reflect i18n
Product: [Fedora] Fedora Reporter: Harold Kornylak <kornylak>
Component: coreutilsAssignee: Ondrej Vasik <ovasik>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: mattdm, twaugh
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-17 11:13:30 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Harold Kornylak 2008-01-12 01:21:35 EST
Description of problem:
SORT behaves as if the -b and -d options are always set.  Sort ignores all
leading blanks and special characters

Version-Release number of selected component (if applicable):
May 25 2005 compile date of /bin/sort

How reproducible:
Always


Steps to Reproduce:
1.create a file with some lines having leading space or leading _
2.sort
3.
  
Actual results:
sort order ignores leading spaces

Expected results:
lines with leading space or _ should sort to top unless -b and -d set


Additional info:
Comment 1 Matthew Miller 2008-01-12 14:13:56 EST
Fedora Core 4 is no longer supported, but I'd like to make sure we get this
problem solved for you. Can you try on a more recent release?
Comment 2 Harold Kornylak 2008-01-14 10:21:02 EST
I will post this to FC 8:
I have verified on Fedora 6 and 8 the same problem:

Sort is set up to behave as if the -b and -d options are set, namely that if 
a blank or a special character are in column one of a specified field, then 
sort looks ahead for the first alphanumeric character and uses that to begin 
the key.  This is not in accordance with the man pages, and I need to be 
able to sort on the field I specified for my programs to work correctly. 
How do I unseclect this behavior?  Specifically, I want column one to be 
ordered all on it's own, and only then to look at column two, unless I ask 
for -b or -d.

So far the only workaround I found is to sort each column as a separate 
field such as -k 1.1,1.1 -k 1.2,1.2 etc

Now sort results in:

a
 a
  a
_a
A
aa
Z
_Z

I want

  a
 a
_a
_Z
a
aa
A
Z 
Comment 3 Matthew Miller 2008-01-14 10:46:16 EST
I see what's going on. This is documented in the info page (type 'info sort')
but not in the man page. The sort program (as all of the coreutils) are
internationalized, which means they respect LANG and LC_* variables. The
relevant part here is: 

  Unless otherwise specified, all comparisons use the character collating sequence  
  specified by the `LC_COLLATE' locale.

The traditional Unix behavior is provided by the special "C" locale. Try running

  LC_COLLATE=C sort filetosort.txt

or more generally

  LANG=C sort filetosort.txt

You can of course set these environment variables in your startup scripts, or
globally by editing /etc/sysconfig/i18n.

(Note that this is standard behavior on proprietary Unix too, although many
people haven't enabled internationalization.)
Comment 4 Matthew Miller 2008-01-14 10:46:49 EST
Reopening, because it wouldn't hurt to fix the man page.
Comment 5 Ondrej Vasik 2008-01-17 11:13:30 EST
On man page of sort you can see written:

*** WARNING *** The locale specified by the  environment  affects  sort
order.  Set LC_ALL=C to get the traditional sort order that uses native
byte values.

and 

 The full documentation for sort is maintained as a Texinfo manual.  If
 the info and sort programs are properly installed  at  your  site,  the
 command
        info sort
 should give you access to the complete manual.

This should be usually enough... anyway - doing more complex manpage will not
get accepted by upstream (as I think it is complex enough in that case) and 
I think that it is not good idea to keep such patches only on RedHat side.

Problem was explained via email and in duplicate bugzilla #428679.

Closing NOTABUG for me...