Bug 449850 - long diff times when comparing files with only newline differences.
long diff times when comparing files with only newline differences.
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: diffutils (Show other bugs)
7
All Linux
low Severity medium
: ---
: ---
Assigned To: Tim Waugh
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F7Update
  Show dependency treegraph
 
Reported: 2008-06-03 17:36 EDT by Joseph Millman
Modified: 2008-06-09 23:11 EDT (History)
0 users

See Also:
Fixed In Version: 2.8.1-16.1.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-06-09 23:11:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
self-contained example test of two ~5MB files. See script for instructions. (4.20 MB, application/octet-stream)
2008-06-03 17:36 EDT, Joseph Millman
no flags Details

  None (edit)
Description Joseph Millman 2008-06-03 17:36:17 EDT
Description of problem:
A diff between two identical files (one using DOS newlines, the other UNIX
newlines) using the -w flag to ignore the whitespace differences takes an
inordinately long time.

Version-Release number of selected component (if applicable):
Bug observed on a number of Redhat platforms.

How reproducible:
very high

Steps to Reproduce:
1. Create a text file. A high number of lines is better than longer lines.
2. Convert to/from DOS/UNIX newlines
3. `diff -w $FILE_1 $FILE_2`
  
Actual results:

The output of `time diff -w $DOS_FILE $UNIX_FILE` with the Redhat supplied diff was:
real    6m21.679s
user    3m54.609s
sys     0m0.117s


Expected results:

The output of `time diff -w $DOS_FILE $UNIX_FILE` with GNU diff from source was:
real    0m5.954s
user    0m0.179s
sys     0m0.037s


Additional info:

The expected results have been corroborated on Debian 4, Ubuntu 8, Solaris SPARC
8, 9, Mac OS X 10.5.2, and FreeBSD 6.2 using their GNU diff, or their
vendor-specific diff.

This behavior has been replicated in RHEL 3, 4, 5.1, CentOS 4.5, Mandriva 2008,
and OpenSUSE 10. Has not been tried in Fedora Core 9, but the existing pattern
leads me to believe it will exist.

This bug appears to work on a geometric scale as a 50MB file of similar format
to the supplied example files will take no less than ~10 hours to diff on a
computer faster than 2.0GHz. During this time the diff process will max a single
CPU/HT/Core but occupy no more than the two files and working space in RAM.

There appears to be no performance difference between physical hardware or a
virtualized environment.
Comment 1 Joseph Millman 2008-06-03 17:36:18 EDT
Created attachment 308292 [details]
self-contained example test of two ~5MB files. See script for instructions.
Comment 2 Tim Waugh 2008-06-04 05:21:36 EDT
Thanks for the report.
I ran the test on:

* Red Hat Enterprise Linux 4 with RHBA-2008-0120 applied
(http://rhn.redhat.com/errata/RHBA-2008-0120.html)
* Red Hat Enterprise Linux 5.2 (which includes RHBA-2008-0068,
http://rhn.redhat.com/errata/RHBA-2008-0068.html)
* Fedora 9
* Fedora 8 with updates
* Fedora 7 with updates

Only Fedora 7 exhibited this behaviour.  Changing version to 7.
Comment 3 Fedora Update System 2008-06-04 05:30:12 EDT
diffutils-2.8.1-16.1.fc7 has been submitted as an update for Fedora 7
Comment 4 Fedora Update System 2008-06-06 03:48:59 EDT
diffutils-2.8.1-16.1.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update diffutils'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F7/FEDORA-2008-5014
Comment 5 Fedora Update System 2008-06-09 23:11:20 EDT
diffutils-2.8.1-16.1.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.