Bug 154283

Summary: diffutils can produce invalid output with dos files
Product: [Fedora] Fedora Reporter: Toshio Kuratomi <toshio>
Component: patchAssignee: Tim Waugh <twaugh>
Status: CLOSED RAWHIDE QA Contact: Mike McLean <mikem>
Severity: low Docs Contact:
Priority: low    
Version: 4Keywords: Patch
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.5.4-25 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-08-11 08:03:47 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 150221    
Attachments:
Description Flags
Patch to stop scanning on linefeed so dos files generate good diffs
none
First file to use with diff
none
Second file to use with diff
none
Patch to fix --show-c-function with dos files
none
Have patch properly ignore \r at the end of a unidiff @@ line.
none
Decide to strip based on non-@@ header lines none

Description Toshio Kuratomi 2005-04-08 19:57:15 EDT
Description of problem:
While using quilt on a project that uses dos line endings, I found that diff
--show-c-function can generate output which will cause patch to fail when run later.

Version-Release number of selected component (if applicable):
diffutils 2.8.1-14
patch 2.5.4-20


How reproducible: Always


Steps to Reproduce:
1.  diff -u --show-c-function test.c.bak test.bak > test.patch
2.
3.
  
Actual results:
Note that the line with the function name on it is in this form:
  @@ -4,5 +4,5 @@ int tester()^M

Expected results:
  @@ -4,5 +4,5 @@ int tester()

Additional info:
Test files test.c and test.c.bak as well as a patch to fix the problem attached.
 Patch has also been submitted to bug-gnu-utils@gnu.org
Comment 1 Toshio Kuratomi 2005-04-08 19:57:15 EDT
Created attachment 112893 [details]
Patch to stop scanning on linefeed so dos files generate good diffs
Comment 2 Toshio Kuratomi 2005-04-08 20:53:09 EDT
Created attachment 112895 [details]
First file to use with diff
Comment 3 Toshio Kuratomi 2005-04-08 20:56:01 EDT
Created attachment 112898 [details]
Second file to use with diff

Run diff -u --show-c-function test.c.bak test.c
to generate output that is problematic for use with patch.
Comment 4 Toshio Kuratomi 2005-04-09 15:41:25 EDT
Created attachment 112916 [details]
Patch to fix --show-c-function with dos files

Here's a better version of the patch to diffutils after a few comments on
bug-gnu-utils@gnu.org
Comment 5 Toshio Kuratomi 2005-04-09 15:50:45 EDT
It was also noted and verified on bug-gnu-utils that patch-2.5.9 does not have
problems with the output generated here.  The latest stable, 2.5.4 does, however.  

Still waiting to hear whether diff outputting \r in that line is officially
considered a bug or if the changes to patch mean this is a non-issue.
Comment 6 Toshio Kuratomi 2005-04-10 16:16:02 EDT
Created attachment 112929 [details]
Have patch properly ignore \r at the end of a unidiff @@ line.

The diffutils maintainer thinks the output from diff should be valid so this
looks like a patch bug instead.  Here's my first attempt at a backport from
patch-2.5.9.  There's quite a few other changes between 2.5.4 and 2.5.9 so this
might not be all that needs to be done but so far it has generated correct
output for me.

Will someone retarget this against patch or should I open a separate bug there?
Comment 7 Toshio Kuratomi 2005-04-20 14:09:23 EDT
Here's the results of my research on the patch/diffutils bug:
ChangeLog:
* In 1998, patch-2.5.2, Stripping of CR for Context diff's was added.
* In 2003, pre-patch-2.5.9, It was realized that stripping was occurring on ed
and unidiffs as well as context diffs (and this was sometimes undesirable).

Codewise:
Stripping of CR's was achieved in pch.c by looking at the patch generated lines
preceeding the patched code.  For context diff's this is:
  **************
  *** 64,70 ****

For unidiffs::
  @@ -64,7 +64,7 @@

If the last line had a trailing \r it was decided that this hunk of the patch
file needed to have CR's stripped.  I'm assuming this is meant to take care of
the case where a patch was generated on a system where the end of line sequence
was CRLF and the diff utility added the CR's to the end of the patch lines even
though the source code did not contain them.

Problems arise when a unidiff is generated against a CRLF file and we use
--show-c-function.  In this case, the code contained in the diff correctly
contains trailing CR's (without which, the lines won't match when patching.) 
However the detection code will find CR's at the end of the displayed c-function
names.  For context diff's there is a second header line that resets the
detection code after the line containing the c-function name.  But for unidiffs,
the name causes a false match in the detection code::

  ************** test_f()^M
  *** 64,70 ****

For unidiffs::
  @@ -64,7 +64,7 @@ test_f()^M

In patch 2.5.9, checking for CR's was simply removed when the patch is a unidiff
or ed patch.

The backport included earlier is complete as it removes the ability for patch to
scan for CR's inside of unidiffs.  It does not do the same for ed diffs but if
that's a requisite for getting this patch applied, I'll be willing to backport
and test that as well.
Comment 8 Tim Waugh 2005-04-29 12:52:02 EDT
Thanks.  Fixed package is patch-2.5.4-23.
Comment 9 Tim Waugh 2005-05-04 06:31:41 EDT
This breaks for patches with DOS CRLF newlines.  Reverting.

We'll pick up the real fix from the next stable release instead.
Comment 10 Toshio Kuratomi 2005-05-04 09:46:05 EDT
Do you have an example of breakage?  I've been testing with CRLF patches without
problems.  I'd like to diagnose where this is failing.

Thanks and sorry I missed something.
Comment 11 Tim Waugh 2005-05-04 09:55:44 EDT
The e2fsprogs package fails to build with patch-2.5.4-23.
Comment 12 Toshio Kuratomi 2005-05-04 11:33:31 EDT
Well, patch-2.5.9 fails on e2fsprogs-1.1589.patch as well.  So it looks like my
backport captures the 2.5.9 changes... but there's a problem at that level upstream.

Basically, e2fsprogs-1.1589.patch appears to have been created with a version of
diff that changes EOL in the patch body to the native OS's EOL. (Could still be
diff's behaviour -- I can't test ATM).  patch-2.5.4 handles this by looking for
^M in the header information of each patch hunk.  If it find ^M then it sets
itself up to strip ^M from the patch body, which readjusts the patch to match
the text being patched.  patch-2.5.9 takes a different approach.  For unidiffs
it assumes that the patch body will always be a verbatim patch of the source and
ignores ^M in the header.

The subject of this bug interacts with this when files which actually have CRLF
are diffed.  With diff -u --show-c-functions (and, I suspect, when diff is run
under dos), ^M sometimes shows up in the patch header.  This leads to patch
rejections because the patch body has the ^M erroneously stripped so lines do
not match.

I'll have to take this issue back upstream to find out what the desired
behaviour is and then I'll report back here.
Comment 13 Toshio Kuratomi 2005-05-06 17:49:21 EDT
Maybe too late for FC4, but I've got a new patch that works for e2fsprogs and my
test cases.  It's based on the following patch against 2.5.9 sent to me by the
patch maintainer::

2003-07-02  Paul Eggert  <eggert@twinsun.com>

        * pch.c (intuit_diff_type): If a unified-diff header line contains
        trailing CR, strip CR from each body line.  This corrects a bug
        introduced in the 2003-05-18 patch.  Bug reported by Andreas
        Gruenbacher.

--- pch.c       2003/05/20 14:03:17     1.44
+++ pch.c       2003/07/02 22:19:21     1.45
@@ -1,6 +1,6 @@
 /* reading patches */
 
-/* $Id: pch.c,v 1.44 2003/05/20 14:03:17 eggert Exp $ */
+/* $Id: pch.c,v 1.45 2003/07/02 22:19:21 eggert Exp $ */
 
 /* Copyright (C) 1986, 1987, 1988 Larry Wall
 
@@ -366,10 +366,16 @@ intuit_diff_type (void)
        if (!stars_last_line && strnEQ(s, "*** ", 4))
            name[OLD] = fetchname (s+4, strippath, &p_timestamp[OLD]);
        else if (strnEQ(s, "+++ ", 4))
+         {
            /* Swap with NEW below.  */
            name[OLD] = fetchname (s+4, strippath, &p_timestamp[OLD]);
+           p_strip_trailing_cr = strip_trailing_cr;
+         }
        else if (strnEQ(s, "Index:", 6))
+         {
            name[INDEX] = fetchname (s+6, strippath, (time_t *) 0);
+           p_strip_trailing_cr = strip_trailing_cr;
+         }
        else if (strnEQ(s, "Prereq:", 7)) {
            for (t = s + 7;  ISSPACE ((unsigned char) *t);  t++)
              continue;
@@ -409,6 +415,7 @@ intuit_diff_type (void)
                    p_timestamp[NEW] = timestamp;
                    p_rfc934_nesting = (t - s) >> 1;
                  }
+               p_strip_trailing_cr = strip_trailing_cr;
              }
          }
        if ((diff_type == NO_DIFF || diff_type == ED_DIFF) &&
Comment 14 Toshio Kuratomi 2005-05-06 18:03:24 EDT
Created attachment 114107 [details]
Decide to strip based on non-@@ header lines

This patch removes scanning for CR at the end of a @@ header line due to the
possibility of CR occuring there when --show-c-functions was specified to diff.
 Instead it scans whenever  Index, ===, +++, or --- header lines are found.

This means the following diff's should all be handled correctly::
1) unidiff's of unix EOL files
2) unidiff's of unix EOL files in which the diff-file has had CR's appended
(the e2fsprogs problem.)
3) unidiff's of DOS EOL files where diff -up was used (the notecase problem.)