Bug 119595 - comm not comparing all lines
comm not comparing all lines
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: coreutils (Show other bugs)
1
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-31 14:48 EST by Mark Komarinski
Modified: 2007-11-30 17:10 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-04-01 04:40:55 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Sample data file (667 bytes, text/plain)
2004-03-31 14:50 EST, Mark Komarinski
no flags Details
second sample data file (667 bytes, text/plain)
2004-03-31 14:50 EST, Mark Komarinski
no flags Details

  None (edit)
Description Mark Komarinski 2004-03-31 14:48:48 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040124 Galeon/1.3.14

Description of problem:
I have two files (called one and two) with lists of lines that I need
to find similarities in.  comm -12 one two gives a proper match up to
one point, then it stops matching and assumes the line is unique in
both the first and second files.

Deleting one line in file fixes this problem, but there's no reason
why a change like that has the result it does.

Version-Release number of selected component (if applicable):
coreutils-5.0-34.1

How reproducible:
Always

Steps to Reproduce:
1. Copy file one and file two into local directory
2. comm -12 one two
3. delete line starting with " 998" in file one
4. comm -12 one two
    

Actual Results:   980 999.000 0.000 C      61
 987 999.000 0.000 HB3    62
 988 999.000 0.000 QB     62
 991 999.000 0.000 HG3    62
 992 999.000 0.000 QG     62
 993 999.000 0.000 CD     62
 994 999.000 0.000 HE2    62
 995 999.000 0.000 C      62


Expected Results:   980 999.000 0.000 C      61
 987 999.000 0.000 HB3    62
 988 999.000 0.000 QB     62
 991 999.000 0.000 HG3    62
 992 999.000 0.000 QG     62
 993 999.000 0.000 CD     62
 994 999.000 0.000 HE2    62
 995 999.000 0.000 C      62
1002 999.000 0.000 HB3    63
1003 999.000 0.000 QB     63
1004 999.000 0.000 CG     63
1007 999.000 0.000 QD2    63
1008 999.000 0.000 CD1    63


Additional info:

This is a subset of data from a larger file.  In the larger files,
removing the 998 line produces the expected results.
Comment 1 Mark Komarinski 2004-03-31 14:50:03 EST
Created attachment 99015 [details]
Sample data file
Comment 2 Mark Komarinski 2004-03-31 14:50:21 EST
Created attachment 99016 [details]
second sample data file
Comment 3 Tim Waugh 2004-04-01 04:40:55 EST
These input files are not sorted, in that 'sort one' differs from 'cat
one'.

You can get the output you desire using:

comm -12 <(sort one) <(sort two) | sort -n

Note You need to log in before you can comment on or make changes to this bug.