Bug 428494 - sort man page needs to be updated to reflect i18n
Summary: sort man page needs to be updated to reflect i18n
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: rawhide
Hardware: i386
OS: Linux
low
medium
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-01-12 06:21 UTC by Harold Kornylak
Modified: 2008-01-17 16:13 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-17 16:13:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Harold Kornylak 2008-01-12 06:21:35 UTC
Description of problem:
SORT behaves as if the -b and -d options are always set.  Sort ignores all
leading blanks and special characters

Version-Release number of selected component (if applicable):
May 25 2005 compile date of /bin/sort

How reproducible:
Always


Steps to Reproduce:
1.create a file with some lines having leading space or leading _
2.sort
3.
  
Actual results:
sort order ignores leading spaces

Expected results:
lines with leading space or _ should sort to top unless -b and -d set


Additional info:

Comment 1 Matthew Miller 2008-01-12 19:13:56 UTC
Fedora Core 4 is no longer supported, but I'd like to make sure we get this
problem solved for you. Can you try on a more recent release?

Comment 2 Harold Kornylak 2008-01-14 15:21:02 UTC
I will post this to FC 8:
I have verified on Fedora 6 and 8 the same problem:

Sort is set up to behave as if the -b and -d options are set, namely that if 
a blank or a special character are in column one of a specified field, then 
sort looks ahead for the first alphanumeric character and uses that to begin 
the key.  This is not in accordance with the man pages, and I need to be 
able to sort on the field I specified for my programs to work correctly. 
How do I unseclect this behavior?  Specifically, I want column one to be 
ordered all on it's own, and only then to look at column two, unless I ask 
for -b or -d.

So far the only workaround I found is to sort each column as a separate 
field such as -k 1.1,1.1 -k 1.2,1.2 etc

Now sort results in:

a
 a
  a
_a
A
aa
Z
_Z

I want

  a
 a
_a
_Z
a
aa
A
Z 


Comment 3 Matthew Miller 2008-01-14 15:46:16 UTC
I see what's going on. This is documented in the info page (type 'info sort')
but not in the man page. The sort program (as all of the coreutils) are
internationalized, which means they respect LANG and LC_* variables. The
relevant part here is: 

  Unless otherwise specified, all comparisons use the character collating sequence  
  specified by the `LC_COLLATE' locale.

The traditional Unix behavior is provided by the special "C" locale. Try running

  LC_COLLATE=C sort filetosort.txt

or more generally

  LANG=C sort filetosort.txt

You can of course set these environment variables in your startup scripts, or
globally by editing /etc/sysconfig/i18n.

(Note that this is standard behavior on proprietary Unix too, although many
people haven't enabled internationalization.)

Comment 4 Matthew Miller 2008-01-14 15:46:49 UTC
Reopening, because it wouldn't hurt to fix the man page.

Comment 5 Ondrej Vasik 2008-01-17 16:13:30 UTC
On man page of sort you can see written:

*** WARNING *** The locale specified by the  environment  affects  sort
order.  Set LC_ALL=C to get the traditional sort order that uses native
byte values.

and 

 The full documentation for sort is maintained as a Texinfo manual.  If
 the info and sort programs are properly installed  at  your  site,  the
 command
        info sort
 should give you access to the complete manual.

This should be usually enough... anyway - doing more complex manpage will not
get accepted by upstream (as I think it is complex enough in that case) and 
I think that it is not good idea to keep such patches only on RedHat side.

Problem was explained via email and in duplicate bugzilla #428679.

Closing NOTABUG for me...


Note You need to log in before you can comment on or make changes to this bug.