Bug 449850 - long diff times when comparing files with only newline differences.
Summary: long diff times when comparing files with only newline differences.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: diffutils
Version: 7
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F7Update
TreeView+ depends on / blocked
 
Reported: 2008-06-03 21:36 UTC by Joseph Millman
Modified: 2008-06-10 03:11 UTC (History)
0 users

Fixed In Version: 2.8.1-16.1.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-10 03:11:27 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
self-contained example test of two ~5MB files. See script for instructions. (4.20 MB, application/octet-stream)
2008-06-03 21:36 UTC, Joseph Millman
no flags Details

Description Joseph Millman 2008-06-03 21:36:17 UTC
Description of problem:
A diff between two identical files (one using DOS newlines, the other UNIX
newlines) using the -w flag to ignore the whitespace differences takes an
inordinately long time.

Version-Release number of selected component (if applicable):
Bug observed on a number of Redhat platforms.

How reproducible:
very high

Steps to Reproduce:
1. Create a text file. A high number of lines is better than longer lines.
2. Convert to/from DOS/UNIX newlines
3. `diff -w $FILE_1 $FILE_2`
  
Actual results:

The output of `time diff -w $DOS_FILE $UNIX_FILE` with the Redhat supplied diff was:
real    6m21.679s
user    3m54.609s
sys     0m0.117s


Expected results:

The output of `time diff -w $DOS_FILE $UNIX_FILE` with GNU diff from source was:
real    0m5.954s
user    0m0.179s
sys     0m0.037s


Additional info:

The expected results have been corroborated on Debian 4, Ubuntu 8, Solaris SPARC
8, 9, Mac OS X 10.5.2, and FreeBSD 6.2 using their GNU diff, or their
vendor-specific diff.

This behavior has been replicated in RHEL 3, 4, 5.1, CentOS 4.5, Mandriva 2008,
and OpenSUSE 10. Has not been tried in Fedora Core 9, but the existing pattern
leads me to believe it will exist.

This bug appears to work on a geometric scale as a 50MB file of similar format
to the supplied example files will take no less than ~10 hours to diff on a
computer faster than 2.0GHz. During this time the diff process will max a single
CPU/HT/Core but occupy no more than the two files and working space in RAM.

There appears to be no performance difference between physical hardware or a
virtualized environment.

Comment 1 Joseph Millman 2008-06-03 21:36:18 UTC
Created attachment 308292 [details]
self-contained example test of two ~5MB files. See script for instructions.

Comment 2 Tim Waugh 2008-06-04 09:21:36 UTC
Thanks for the report.
I ran the test on:

* Red Hat Enterprise Linux 4 with RHBA-2008-0120 applied
(http://rhn.redhat.com/errata/RHBA-2008-0120.html)
* Red Hat Enterprise Linux 5.2 (which includes RHBA-2008-0068,
http://rhn.redhat.com/errata/RHBA-2008-0068.html)
* Fedora 9
* Fedora 8 with updates
* Fedora 7 with updates

Only Fedora 7 exhibited this behaviour.  Changing version to 7.


Comment 3 Fedora Update System 2008-06-04 09:30:12 UTC
diffutils-2.8.1-16.1.fc7 has been submitted as an update for Fedora 7

Comment 4 Fedora Update System 2008-06-06 07:48:59 UTC
diffutils-2.8.1-16.1.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update diffutils'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F7/FEDORA-2008-5014

Comment 5 Fedora Update System 2008-06-10 03:11:20 UTC
diffutils-2.8.1-16.1.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.