Bug 531355

Summary: sort join combination produces not in sorted order messages unless LANG=C
Product: [Fedora] Fedora Reporter: Mike Hanafey <mike.hanafey>
Component: coreutilsAssignee: Ondrej Vasik <ovasik>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: kdudka, ovasik, p, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-30 09:25:36 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
2 example files. none

Description Mike Hanafey 2009-10-27 16:25:37 EDT
Created attachment 366342 [details]
2 example files.

Description of problem:


Version-Release number of selected component (if applicable):coreutils-7.2-4.fc11.x86_64


How reproducible: Always


Steps to Reproduce:
Using the small example files provided as an attachment --

for l in en_US.UTF-8 en_US.utf8 en_US.iso88591 en_US.ISO-8859-1 C; do
  echo "-------------------------------------------------------------------------------------------------"
  echo "LANG=$l"
  export LANG=$l
  sort -o all-primary.csv all-primary.csv
  sort -o db-primary.csv db-primary.csv
  join -v 1 all-primary.csv db-primary.csv
done

  
Actual results:
-------------------------------------------------------------------------------------------------
LANG=en_US.UTF-8
Industrial CHO and Lipids
join: file 1 is not in sorted order
join: file 2 is not in sorted order
null
Root
-------------------------------------------------------------------------------------------------
LANG=en_US.utf8
Industrial CHO and Lipids
join: file 1 is not in sorted order
join: file 2 is not in sorted order
null
Root
-------------------------------------------------------------------------------------------------
LANG=en_US.iso88591
Industrial CHO and Lipids
join: file 1 is not in sorted order
join: file 2 is not in sorted order
null
Root
-------------------------------------------------------------------------------------------------
LANG=en_US.ISO-8859-1
Industrial CHO and Lipids
join: file 1 is not in sorted order
join: file 2 is not in sorted order
null
Root
-------------------------------------------------------------------------------------------------
LANG=C
Industrial CHO and Lipids
Root
null



Expected results:
No sort order messages.

Additional info:
Comment 1 Pádraig Brady 2009-10-27 17:04:13 EDT
This is due I think to sort using the whole line by default, whereas join just uses the first field. Specifically in non C locals the ' ' chars are sorting differently relative to other chars. If you want `join` to use the whole line use -t'\0'
Comment 2 Mike Hanafey 2009-10-30 09:25:36 EDT
Sorry, my mistake. Thought I was doing this on whole lines. The -t option was left off.