Bug 119595

Summary: comm not comparing all lines
Product: [Fedora] Fedora Reporter: Mark Komarinski <mkomarinski>
Component: coreutilsAssignee: Tim Waugh <twaugh>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 1   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-01 04:40:55 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Sample data file
none
second sample data file none

Description Mark Komarinski 2004-03-31 14:48:48 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040124 Galeon/1.3.14

Description of problem:
I have two files (called one and two) with lists of lines that I need
to find similarities in.  comm -12 one two gives a proper match up to
one point, then it stops matching and assumes the line is unique in
both the first and second files.

Deleting one line in file fixes this problem, but there's no reason
why a change like that has the result it does.

Version-Release number of selected component (if applicable):
coreutils-5.0-34.1

How reproducible:
Always

Steps to Reproduce:
1. Copy file one and file two into local directory
2. comm -12 one two
3. delete line starting with " 998" in file one
4. comm -12 one two
    

Actual Results:   980 999.000 0.000 C      61
 987 999.000 0.000 HB3    62
 988 999.000 0.000 QB     62
 991 999.000 0.000 HG3    62
 992 999.000 0.000 QG     62
 993 999.000 0.000 CD     62
 994 999.000 0.000 HE2    62
 995 999.000 0.000 C      62


Expected Results:   980 999.000 0.000 C      61
 987 999.000 0.000 HB3    62
 988 999.000 0.000 QB     62
 991 999.000 0.000 HG3    62
 992 999.000 0.000 QG     62
 993 999.000 0.000 CD     62
 994 999.000 0.000 HE2    62
 995 999.000 0.000 C      62
1002 999.000 0.000 HB3    63
1003 999.000 0.000 QB     63
1004 999.000 0.000 CG     63
1007 999.000 0.000 QD2    63
1008 999.000 0.000 CD1    63


Additional info:

This is a subset of data from a larger file.  In the larger files,
removing the 998 line produces the expected results.
Comment 1 Mark Komarinski 2004-03-31 14:50:03 EST
Created attachment 99015 [details]
Sample data file
Comment 2 Mark Komarinski 2004-03-31 14:50:21 EST
Created attachment 99016 [details]
second sample data file
Comment 3 Tim Waugh 2004-04-01 04:40:55 EST
These input files are not sorted, in that 'sort one' differs from 'cat
one'.

You can get the output you desire using:

comm -12 <(sort one) <(sort two) | sort -n