Bug 54571

Summary: Bug in GCC 2.96 series (all available versions tested)
Product: [Retired] Red Hat Linux Reporter: Erich Boleyn <erich>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: sandeen, t8m
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-10-19 16:46:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Erich Boleyn 2001-10-12 17:45:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010917

Description of problem:
GCC 2.96 -81 (RH 7.1 release), -85 (RH 7.1 update), -88 and -99 (rawhide
versions including the most recently available) all have a bug where they
produce bad code on certain sequences, and the SGI XFS guys porting their
code to linux have been bitten by what seems to be this bug several times
in compiling kernels and subtle bugs that arise from the bad code produced.

They've worked around problems that seemed to be caused by this before, but
this time I analyzed the assembly output for them from a bunch of different
compiler versions.  This problem does *not* occur in kgcc nor in the GCC
3.0.1 version I just got from RawHide.

Note that it may be related to use of 64-bit values, as the areas of code
where this has shown up seem to be using long long types a lot.

Problem seems to be that there are some operations done to what appear to
be register spill locations, but the spill and reload are never
performed...  so it seems these locations were allocated for spill/reload,
but then wires got crossed somewhere.  A trivial change to put in a
temporary seemed to reduce the register pressure or something and get it to
generate correct code...  but tracking down these kinds of problems is bad
news.

(source and example provided below)


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Get current XFS kernel code tree from publically available CVS at
oss.sgi.com.  You could also grab it directly, with appropriate ".config"
file, from "http://www.uruk.org/linux-2.4.13-pre1-xfs.tar.gz".  This is a
kernel source tree.  I could probably prune it for you, but haven't taken
the time to do that yet.

2. Unpack in "/usr/src" (or re-run "make dep" I think to update the paths),
and compile (with 2.96 any version, but -85 was the one I made my analysis
on) source file "fs/xfs/xfs_fsops.c" to assembly, and call this the "fixed"
version.  Grab diff/patch at URL
"http://oss.sgi.com/cgi-bin/cvsweb.cgi/linux-2.4-xfs/linux/fs/xfs/xfs_fsops.c.diff?r1=1.69&r2=1.70"
and apply in reverse (file in tree is already patched), then produce
assembly output again and call this the "original" version.

3. Compare "original" to "fixed" version, and note that the assembly output
of the "original" just before the second "call pagebuf_get", in the
parameters being calculated to be pushed on the stack, is incorrect (it
seems to do some of the calculations on register spill locations with no
spill/reload being performed, weird).  The assembly for the "fixed" one is
correct.


Actual Results:  
   "original version", just before the second "call pagebuf_get":

        sall    %cl, %eax
        testb   $32, %cl
        cmovne  %eax, %edx
        cmovne  %edi, %eax

    [no spill?!?]

  -->   addl    $2, 16(%esp)
  -->   adcl    $0, 20(%esp)

    [no reload!?!]

        shldl   $9, %eax, %edx
        sall    $9, %eax
        pushl   %edx
        pushl   %eax



Expected Results:  
NOTE, there are some extra instructions in the middle due to the
re-arrangement of the parameter calculations in the source code, but that's
perfectly normal.

The "S->" and "R->" labeled instructions point out the register spill and
reload that is not happening above.

   "fixed version", just before the second "call pagebuf_get":

        sall    %cl, %eax
        testb   $32, %cl
        cmovne  %eax, %edx
        cmovne  %ebx, %eax
  S->   movl    %eax, 8(%esp)
  -->   addl    $2, 8(%esp)
  S->   movl    %edx, 12(%esp)
  -->   adcl    $0, 12(%esp)
        pushl   $8708
        movl    80(%esp), %ecx
        sall    $9, %ecx
        pushl   %ecx
  R->   movl    16(%esp), %eax
  R->   movl    20(%esp), %edx
        shldl   $9, %eax, %edx
        sall    $9, %eax
        pushl   %edx
        pushl   %eax



Additional info:

Comment 1 Jakub Jelinek 2001-10-12 17:48:58 UTC
Can you please provide xfs_fsops.i (e.g. by adding -save-temps to gcc command
line used to compile it) and also gcc options used?
Thanks.

Comment 2 Erich Boleyn 2001-10-12 18:18:05 UTC
The compiler command-line invoked by the kernel compile is (go to the "fs/xfs"
subdirectory):

gcc -D__KERNEL__ -I/usr/src/linux-2.4.13-pre1/include  -Wall -Wstrict-prototypes
-Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pipe
-mpreferred-stack-boundary=2 -march=i686  -I. -I/usr/src/linux-2.4.13-pre1/fs 
-funsigned-char  -c xfs_fsops.c

The patched "original" file (remember, the one in the source tree is the "fixed"
one with the workaround) is at: "http://www.uruk.org/xfs_fsops.orig.c".

The output of "-save-temps" are in:
"http://www.uruk.org/xfs_fsops.i" and "http://www.uruk.org/xfs_fsops.orig.i".
Is this enough?  Let me know if you need more.


Comment 3 Erich Boleyn 2001-10-12 19:07:50 UTC
Whoops, forgot to note exactly which sub-version I made those "-save-temps"
files with (output of "gcc -v"):

Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-99)

Also, for convenience, I created the assembly output files and put them in the
same place as the other files:
"http://www.uruk.org/xfs_fsops.s" and "http://www.uruk.org/xfs_fsops.orig.s"


Comment 4 Erich Boleyn 2001-10-12 19:10:17 UTC
OK, somehow the severity got changed to "enhancement"...  sorry for the extra
cruft messages, changing it back to "high".

Comment 5 Jakub Jelinek 2001-10-19 16:46:11 UTC
I've simplified this into:
/* { dg-do run { i?86-*-* } */
/* { dg-options "-O2 -fomit-frame-pointer -march=i686" } */

typedef struct {
  void *s1a;
} s1;
typedef struct {
  unsigned int s2a;
  unsigned long long s2b;
  unsigned int s2c;
  unsigned int s2d;
  unsigned short s2e;
  unsigned char s2f;
  s1 *s2g;
  unsigned char s2h;
} s2;
typedef struct
{
  unsigned int s3a;
  unsigned int s3b;
  unsigned int s3c;
  unsigned int s3d;
  unsigned int s3e[2];
  unsigned int s3f[3];
  unsigned int s3g;
  unsigned int s3h;
  unsigned int s3i;
  unsigned int s3j;
  unsigned int s3k;
} s3;
typedef struct
{
  unsigned int s4a;
  unsigned int s4b;
  unsigned int s4c;
  unsigned int s4d;
  unsigned int s4e;
  unsigned int s4f;
  unsigned int s4g;
  unsigned int s4h;
  unsigned int s4i;
  unsigned int s4j;
  unsigned int s4k[64];
} s4;
typedef struct
{
  unsigned int s5a;
  unsigned int s5b;
} s5;
typedef struct
{
  unsigned int s6a;
  unsigned short s6b;
  unsigned short s6c;
  unsigned int s6d;
  unsigned int s6e;
} s6;
typedef struct
{
  unsigned long long s7a;
  unsigned int s7b;
} s7;

char buffer[1024];

void f1 (void *x, int y, unsigned int z)
{
}

static inline const unsigned int f2 (unsigned int x)
{
  asm("" : "=r" (x) : "0" (x));
  return x;
}

int f3 (s2 *x, void *y, long long z, s1 **p)
{
  return 0;
}

void *f4(s2 *x, unsigned int y)
{
  return 0;
}

int f5(void *x)
{
  return 0;
}

int f6(s2 *x, s1 *y)
{
  static int i;

  return i++;
}

s1 *foo (void *x, long long y, unsigned int z, int v)
{
  static s1 a;
  static int b;

  if (v != 0x2204 || x != buffer)
    abort ();
  if (y != 0x2c5e780000000200LL + b)
    abort ();
  b += 0x200;
  a.s1a = (char *) x + 32;
  return &a;
}

int test (s2 *x, s7 *y)
{
  s1 *v1;
  s3 *v3;
  s4 *v4;
  s5 *v5;
  s6 *v6;
  unsigned int a, b, c;
  int d, e, f, g, h, i;
  unsigned int j, k;
  unsigned long long l, m, n;
  void *o;

  l = y->s7a;
  h = y->s7b;
  f = h - x->s2f;
  g = f3 (x, x->s2g, (l << x->s2h) - 1, &v1);
  if (g)
    return g;

  m = l;
  j = m + 1;
  m = y->s7a - x->s2b;
  k = x->s2d;
  o = f4 (x, 14);
  g = f5 (o);
  if (g)
    return g;

  i = ((unsigned long long) x->s2e + 511) >> 9;
  d = x->s2a;

  n = 0;
  for (a = j - 1; a >= k; a--, m -= b) {
    v1 = foo (x->s2g->s1a,
              ((((unsigned long long) a * x->s2c) << x->s2h) + 1LL) << 9,
              i << 9, 0x2204);
    v3 = (s3 *)v1->s1a;
    f1 (v3, 0, x->s2e);
    v3->s3a = 0x58414746;
    v3->s3a = f2 (v3->s3a);
    v3->s3b = 1;
    v3->s3b = f2 (v3->s3b);
    v3->s3c = a;
    v3->s3c = f2 (v3->s3c);
    if (a == j - 1)
      b = l - a * (unsigned long long) x->s2c;
    else
      b = x->s2c;
    v3->s3d = b;
    v3->s3d = f2 (v3->s3d);
    v3->s3e[0] = (3LL >> x->s2h) + 1;
    v3->s3e[0] = f2 (v3->s3e[0]);
    v3->s3e[1] = (3LL >> x->s2h) + 2;
    v3->s3e[1] = f2 (v3->s3e[1]);
    v3->s3f[0] = 1;
    v3->s3f[0] = f2 (v3->s3f[0]);
    v3->s3f[1] = 1;
    v3->s3f[1] = f2 (v3->s3f[1]);
    v3->s3g = 0;
    v3->s3g = f2 (v3->s3g);
    v3->s3h = 512 / sizeof (unsigned int) - 1;
    v3->s3h = f2 (v3->s3h);
    v3->s3i = 0;
    v3->s3i = f2 (v3->s3i);
    c = b - ((unsigned int) (3LL >> x->s2h)) - 4;
    v3->s3j = c;
    v3->s3j = f2 (v3->s3j);
    v3->s3k = c;
    v3->s3k = f2 (v3->s3k);
    g = f6 (x, v1);
    if (g)
      goto l0;

    v1 = foo (x->s2g->s1a,
              ((((unsigned long long) a * x->s2c) << x->s2h) + 2LL) << 9,
              i << 9, 0x2204);
    v4 = (s4 *)v1->s1a;
    f1 (v4, 0, x->s2e);
    v4->s4a = 0x58414749;
    v4->s4a = f2 (v4->s4a);
    v4->s4b = 1;
    v4->s4b = f2 (v4->s4b);
    v4->s4c = a;
    v4->s4c = f2 (v4->s4c);
    v4->s4d = b;
    v4->s4d = f2 (v4->s4d);
    v4->s4e = 0;
    v4->s4e = f2 (v4->s4e);
    v4->s4f = (3LL >> x->s2h) + 3;
    v4->s4f = f2 (v4->s4f);
    v4->s4g = 1;
    v4->s4g = f2 (v4->s4g);
    v4->s4h = 0;
    v4->s4h = f2 (v4->s4h);
    v4->s4i = -1;
    v4->s4i = f2 (v4->s4i);
    v4->s4j = -1;
    v4->s4j = f2 (v4->s4j);
    for (e = 0; e < 64; e++)
      {
        v4->s4k[e] = -1;
        v4->s4k[e] = f2 (v4->s4k[e]);
      }
    g = f6 (x, v1);
    if (g)
      goto l0;

    v1 = foo (x->s2g->s1a,
              (((unsigned long long) a * x->s2c
                + (unsigned int) ((3LL >> x->s2h) + 1)) << x->s2h) << 9,
              (((unsigned long long) d + 511) >> 9) << 9, 0x2204);
    v6 = (s6 *)v1->s1a;
    f1 (v6, 0, d);
    v6->s6a = 0x41425442;
    v6->s6a = f2 (v6->s6a);
    v6->s6b = 0;
    v6->s6b = f2 (v6->s6b);
    v6->s6c = 1;
    v6->s6c = f2 (v6->s6c);
    v6->s6d = -1;
    v6->s6d = f2 (v6->s6d);
    v6->s6e = -1;
    v6->s6e = f2 (v6->s6e);
    v5 = (s5 *) ((char *) v6 + 24);
    v5->s5a = (3LL >> x->s2h) + 4;
    v5->s5a = f2 (v5->s5a);
    v5->s5b = b - v5->s5a;
    g = f6 (x, v1);
    if (g)
      goto l0;
  }
  return 0;

l0:
  return g;
}

s2 a;
s7 b;
s1 c;

int main (void)
{
  b.s7a = 66;
  a.s2c = 0xabcdef;
  a.s2g = &c;
  a.s2h = 33;
  c.s1a = buffer;
  if (test (&a, &b) != 1)
    abort ();
  exit (0);
}

. It is some reload problem where reload first decides to put a DImode
pseudo into %eax/%edx pair but later forgets about it and uses a stack slot
for it.

Comment 6 Jakub Jelinek 2001-11-28 12:50:59 UTC
Should be fixed in gcc-2.96-100 and above.

Comment 7 Erich Boleyn 2001-12-12 21:36:45 UTC
Tested it on my original code/problem and a recent kernel/etc. and all seems
good now.  You might want to release an updated gcc for 7.1 and 7.2.

Comment 8 Bill Nottingham 2002-07-26 21:47:29 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2002-055.html