Bug 560127 - 'sort -V' sorts alpha parts incorrectly
Summary: 'sort -V' sorts alpha parts incorrectly
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kamil Dudka
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-29 22:47 UTC by Bruce Jerrick
Modified: 2010-09-28 10:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-28 10:13:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Bruce Jerrick 2010-01-29 22:47:04 UTC
Description of problem:
Given input with alpha parts with different lengths, a non-alphanum
separator, and a numeric part, 'sort -V' produces incorrect output.
It seems to be ignoring length differences of the alpha part even
when the separator and numeric parts are the same in all input.

Version-Release number of selected component (if applicable):
coreutils-7.6-8.fc12
coreutils-7.2-7.fc11

How reproducible:
100%

Steps to Reproduce:
sort -V
aaa.1
aaaa.1
^D

Actual results:
aaaa.1
aaa.1

Expected results:
aaa.1
aaaa.1

Additional info:

Without -V, the result is correct.
If the '.1'  is omitted, result is correct.
If just the '.' is omitted, result is correct.
Behavior is the same if '-' is used instead of '.' .

Comment 1 Kamil Dudka 2010-01-29 23:11:03 UTC
Thank you for filling the bug!

> Actual results:
> aaaa.1
> aaa.1

Core of the version-compare predicate of 'sort -V' is taken from 'dpkg', which also works that way:

$ dpkg --compare-versions aaaa.1 \< aaa.1 && echo LT
LT

It's partially documented within the 'filevercmp' gnulib module:

/* slightly modified verrevcmp function from dpkg
   S1, S2 - compared string
   S1_LEN, S2_LEN - length of strings to be scanned

   This implements the algorithm for comparison of version strings
   specified by Debian and now widely adopted.  The detailed
   specification can be found in the Debian Policy Manual in the
   section on the `Version' control field.  This version of the code
   implements that from s5.6.12 of Debian Policy v3.8.0.1
   http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version */

I propose to improve the documentation on the coreutils' part as the first step.

> Expected results:
> aaa.1
> aaaa.1

I don't think we are able to change the implementation to get the result above without breaking anything else.  The key problem is that there are different version schemes and either you choose, it will be a trade-off.

> If the '.1'  is omitted, result is correct.
> If just the '.' is omitted, result is correct.

same as 'dpkg --compare-versions'

> Behavior is the same if '-' is used instead of '.' .    

This seems to differ from 'dpkg --compare-versions', not yet investigated further.

Comment 2 Kamil Dudka 2010-03-03 10:43:14 UTC
(In reply to comment #0)
> Actual results:
> aaaa.1
> aaa.1

It bails out here:

$ printf "aaa.1\naaaa.1\n" > in && gdb -q --args sort -V in
(gdb) b filevercmp
Breakpoint 1 at 0x409e52: file filevercmp.c, line 134.
(gdb) r
Breakpoint 1, filevercmp (s1=0x61b4d0 "aaa.1", s2=0x61b4d6 "aaaa.1") at filevercmp.c:134
134       int simple_cmp = strcmp (s1, s2);
(gdb) b 97
Breakpoint 2 at 0x409ca4: file filevercmp.c, line 97.
(gdb) c
Breakpoint 2, verrevcmp (s1=0x61b4d0 "aaa.1", s1_len=5, s2=0x61b4d6 "aaaa.1", s2_len=6) at filevercmp.c:97
97                  return s1_c - s2_c;
(gdb) print s1[s1_pos]
$1 = 46 '.'
(gdb) print s2[s2_pos]
$2 = 97 'a'
(gdb) print s1_c
$3 = 302
(gdb) print s2_c
$4 = 97

Comment 3 Kamil Dudka 2010-03-03 10:52:01 UTC
(In reply to comment #1)
> > Behavior is the same if '-' is used instead of '.' .    
> 
> This seems to differ from 'dpkg --compare-versions', not yet investigated
> further.    

The difference is in the preprocessing in case of dpkg.  When '-' is used, the suffix is cut off before it gets into verrevcmp():

$ gdb -q --args src/dpkg --compare-versions aaa.1 \< aaaa.1
(gdb) b verrevcmp
Breakpoint 1 at 0x41e390: file vercmp.c, line 42.
(gdb) r
Breakpoint 1, verrevcmp (val=0x7466c0 "aaa.1", ref=0x7466d0 "aaaa.1") at vercmp.c:42

$ gdb -q --args src/dpkg --compare-versions aaa-1 \< aaaa-1
(gdb) b verrevcmp
Breakpoint 1 at 0x41e390: file vercmp.c, line 42.
(gdb) r
Breakpoint 1, verrevcmp (val=0x7466c0 "aaa", ref=0x7466d0 "aaaa") at vercmp.c:42

Comment 4 Kamil Dudka 2010-09-28 10:13:14 UTC
I have no idea how to fix it, without breaking anything else.  Thanks for understanding!  Feel free to reopen as soon as you have a solution.


Note You need to log in before you can comment on or make changes to this bug.