Red Hat Bugzilla – Bug 49449
The textutils utility comm is broken
Last modified: 2016-11-24 09:58:27 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.5 i686)
Description of problem:
The comm utility which is part of the RH 7.1 textutils-2.0.11-7 package is
broken. Comm is failing to corectly identify unique or common entries in
Here is a quick example:
The output of the comm command below should be all the common entries
contained in both tmp.1 and tmp.2.
comm -12 tmp.1 tmp.2
Where tmp.1 contains:
and tmp.2 contains:
The output should be (and is under Solaris and RH 6.2):
But RH 7.1 comm is returning:
Ie: nothing is output after the first unique entry in tmp.1 is found. This
is only one example. Comm is also failing in many other modes also.
Steps to Reproduce:
1. Use comm with the two tmp files included above
This seems to be a glibc issue - still happens after recompiling textutils
packages from 6.x on 7.x systems...
Jakub, any idea? (Might be one of the locale changes)
AFAICT, this is the correct behavior. According to the comm man page, it
does a *line-by-line* comparison. The two files provided in your bug report
are not common beyond the eighth line -- so 10,11, and 13 should not be
returned by "$ comm -12 tmp.1 tmp.2".
FWIW, I just verified the same behaviour on a 6.2 system:
$ rpm -q textutils
$ cat /etc/redhat-release
Red Hat Linux release 6.2 (Zoot)
$ com -12 tmp.1 tmp.2
I even down graded the version on the 6.2 system to 2.0a-2 (the version
originally shipped w/ the distribution) and the results were the same.
Does this make sense?
Created attachment 24767 [details]
OK - I could attach tmp.1 but not tmp.2 there is some weirdness in bugzilla. To
reproduce my problem you need to use the files as given Ie. Keep the right
justification of the integers. Then you will see that comm in RH7.1 gives
different results to RH6.2, Solaris2.6 and HP-UX11. My reading of the comm
documentation is that comm -12 should return the lines common to both files (I
am not a comm expert :-)). Anyway my concern is that linux utilities should not
change their behaviour from one release to the next unless bugs are fixed. If
this is not the case I will need to start testing the >1000 scripts I use on a
regular basis each time I upgrade any of our RH linux packages? If this is a
bug fix then so be it, I will be suprised and disapointed, the question remains
why is the behaviour different to every other NIX I could test.
You are exactly right. When I used right-justification, version 2.0a-2 did
return different results than the newer versions....
I noticed that when I sorted the files prior to running comm, it correctly
identified the common entries in the files regardless of the justification
of the contents.
comm behaviour in 7.1 is correct.
info gives about comm:
Before `comm' can be used, the input files must be sorted using the
collating sequence specified by the `LC_COLLATE' locale. If an input
file ends in a non-newline character, a newline is silently appended.
The `sort' command with no options always outputs a file that is
suitable input to `comm'.
Your example files are sorted using the "C" collating sequence,
for most of other collating sequences they are unsorted.
Just check what will sort do with your input files to see...
Running LC_COLLATE=C comm -12 tmp.1 tmp.2
will give you the results you expect.
E.g. with LC_COLLATE=en_US (or LC_ALL=en_US or LANG=en_US if neither
is set), sorting the files will give 10 after 1, followed by 11, 13,
2, etc. So, if comm is run with such LC_COLLATE, input should be
sorted that way.