From Bugzilla Helper: User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.5 i686) Description of problem: The comm utility which is part of the RH 7.1 textutils-2.0.11-7 package is broken. Comm is failing to corectly identify unique or common entries in the files given. Here is a quick example: The output of the comm command below should be all the common entries contained in both tmp.1 and tmp.2. comm -12 tmp.1 tmp.2 Where tmp.1 contains: 1 2 3 4 5 6 7 8 9 10 11 13 and tmp.2 contains: 1 2 3 4 5 6 7 8 10 11 13 The output should be (and is under Solaris and RH 6.2): 1 2 3 4 5 6 7 8 10 11 13 But RH 7.1 comm is returning: 1 2 3 4 5 6 7 8 Ie: nothing is output after the first unique entry in tmp.1 is found. This example is only one example. Comm is also failing in many other modes also. How reproducible: Always Steps to Reproduce: 1. Use comm with the two tmp files included above 2. 3. Additional info:
This seems to be a glibc issue - still happens after recompiling textutils packages from 6.x on 7.x systems... Jakub, any idea? (Might be one of the locale changes)
AFAICT, this is the correct behavior. According to the comm man page, it does a *line-by-line* comparison. The two files provided in your bug report are not common beyond the eighth line -- so 10,11, and 13 should not be returned by "$ comm -12 tmp.1 tmp.2". FWIW, I just verified the same behaviour on a 6.2 system: $ rpm -q textutils textutils-2.0e-6 $ cat /etc/redhat-release Red Hat Linux release 6.2 (Zoot) $ com -12 tmp.1 tmp.2 1 2 3 4 5 6 7 8 I even down graded the version on the 6.2 system to 2.0a-2 (the version originally shipped w/ the distribution) and the results were the same. Does this make sense?
Created attachment 24767 [details] tmp.1
OK - I could attach tmp.1 but not tmp.2 there is some weirdness in bugzilla. To reproduce my problem you need to use the files as given Ie. Keep the right justification of the integers. Then you will see that comm in RH7.1 gives different results to RH6.2, Solaris2.6 and HP-UX11. My reading of the comm documentation is that comm -12 should return the lines common to both files (I am not a comm expert :-)). Anyway my concern is that linux utilities should not change their behaviour from one release to the next unless bugs are fixed. If this is not the case I will need to start testing the >1000 scripts I use on a regular basis each time I upgrade any of our RH linux packages? If this is a bug fix then so be it, I will be suprised and disapointed, the question remains why is the behaviour different to every other NIX I could test.
You are exactly right. When I used right-justification, version 2.0a-2 did return different results than the newer versions.... I noticed that when I sorted the files prior to running comm, it correctly identified the common entries in the files regardless of the justification of the contents.
comm behaviour in 7.1 is correct. info gives about comm: Before `comm' can be used, the input files must be sorted using the collating sequence specified by the `LC_COLLATE' locale. If an input file ends in a non-newline character, a newline is silently appended. The `sort' command with no options always outputs a file that is suitable input to `comm'. Your example files are sorted using the "C" collating sequence, for most of other collating sequences they are unsorted. Just check what will sort do with your input files to see... Running LC_COLLATE=C comm -12 tmp.1 tmp.2 will give you the results you expect. E.g. with LC_COLLATE=en_US (or LC_ALL=en_US or LANG=en_US if neither is set), sorting the files will give 10 after 1, followed by 11, 13, 2, etc. So, if comm is run with such LC_COLLATE, input should be sorted that way.