From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010917 Description of problem: GCC 2.96 -81 (RH 7.1 release), -85 (RH 7.1 update), -88 and -99 (rawhide versions including the most recently available) all have a bug where they produce bad code on certain sequences, and the SGI XFS guys porting their code to linux have been bitten by what seems to be this bug several times in compiling kernels and subtle bugs that arise from the bad code produced. They've worked around problems that seemed to be caused by this before, but this time I analyzed the assembly output for them from a bunch of different compiler versions. This problem does *not* occur in kgcc nor in the GCC 3.0.1 version I just got from RawHide. Note that it may be related to use of 64-bit values, as the areas of code where this has shown up seem to be using long long types a lot. Problem seems to be that there are some operations done to what appear to be register spill locations, but the spill and reload are never performed... so it seems these locations were allocated for spill/reload, but then wires got crossed somewhere. A trivial change to put in a temporary seemed to reduce the register pressure or something and get it to generate correct code... but tracking down these kinds of problems is bad news. (source and example provided below) Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Get current XFS kernel code tree from publically available CVS at oss.sgi.com. You could also grab it directly, with appropriate ".config" file, from "http://www.uruk.org/linux-2.4.13-pre1-xfs.tar.gz". This is a kernel source tree. I could probably prune it for you, but haven't taken the time to do that yet. 2. Unpack in "/usr/src" (or re-run "make dep" I think to update the paths), and compile (with 2.96 any version, but -85 was the one I made my analysis on) source file "fs/xfs/xfs_fsops.c" to assembly, and call this the "fixed" version. Grab diff/patch at URL "http://oss.sgi.com/cgi-bin/cvsweb.cgi/linux-2.4-xfs/linux/fs/xfs/xfs_fsops.c.diff?r1=1.69&r2=1.70" and apply in reverse (file in tree is already patched), then produce assembly output again and call this the "original" version. 3. Compare "original" to "fixed" version, and note that the assembly output of the "original" just before the second "call pagebuf_get", in the parameters being calculated to be pushed on the stack, is incorrect (it seems to do some of the calculations on register spill locations with no spill/reload being performed, weird). The assembly for the "fixed" one is correct. Actual Results: "original version", just before the second "call pagebuf_get": sall %cl, %eax testb $32, %cl cmovne %eax, %edx cmovne %edi, %eax [no spill?!?] --> addl $2, 16(%esp) --> adcl $0, 20(%esp) [no reload!?!] shldl $9, %eax, %edx sall $9, %eax pushl %edx pushl %eax Expected Results: NOTE, there are some extra instructions in the middle due to the re-arrangement of the parameter calculations in the source code, but that's perfectly normal. The "S->" and "R->" labeled instructions point out the register spill and reload that is not happening above. "fixed version", just before the second "call pagebuf_get": sall %cl, %eax testb $32, %cl cmovne %eax, %edx cmovne %ebx, %eax S-> movl %eax, 8(%esp) --> addl $2, 8(%esp) S-> movl %edx, 12(%esp) --> adcl $0, 12(%esp) pushl $8708 movl 80(%esp), %ecx sall $9, %ecx pushl %ecx R-> movl 16(%esp), %eax R-> movl 20(%esp), %edx shldl $9, %eax, %edx sall $9, %eax pushl %edx pushl %eax Additional info:
Can you please provide xfs_fsops.i (e.g. by adding -save-temps to gcc command line used to compile it) and also gcc options used? Thanks.
The compiler command-line invoked by the kernel compile is (go to the "fs/xfs" subdirectory): gcc -D__KERNEL__ -I/usr/src/linux-2.4.13-pre1/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -I. -I/usr/src/linux-2.4.13-pre1/fs -funsigned-char -c xfs_fsops.c The patched "original" file (remember, the one in the source tree is the "fixed" one with the workaround) is at: "http://www.uruk.org/xfs_fsops.orig.c". The output of "-save-temps" are in: "http://www.uruk.org/xfs_fsops.i" and "http://www.uruk.org/xfs_fsops.orig.i". Is this enough? Let me know if you need more.
Whoops, forgot to note exactly which sub-version I made those "-save-temps" files with (output of "gcc -v"): Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-99) Also, for convenience, I created the assembly output files and put them in the same place as the other files: "http://www.uruk.org/xfs_fsops.s" and "http://www.uruk.org/xfs_fsops.orig.s"
OK, somehow the severity got changed to "enhancement"... sorry for the extra cruft messages, changing it back to "high".
I've simplified this into: /* { dg-do run { i?86-*-* } */ /* { dg-options "-O2 -fomit-frame-pointer -march=i686" } */ typedef struct { void *s1a; } s1; typedef struct { unsigned int s2a; unsigned long long s2b; unsigned int s2c; unsigned int s2d; unsigned short s2e; unsigned char s2f; s1 *s2g; unsigned char s2h; } s2; typedef struct { unsigned int s3a; unsigned int s3b; unsigned int s3c; unsigned int s3d; unsigned int s3e[2]; unsigned int s3f[3]; unsigned int s3g; unsigned int s3h; unsigned int s3i; unsigned int s3j; unsigned int s3k; } s3; typedef struct { unsigned int s4a; unsigned int s4b; unsigned int s4c; unsigned int s4d; unsigned int s4e; unsigned int s4f; unsigned int s4g; unsigned int s4h; unsigned int s4i; unsigned int s4j; unsigned int s4k[64]; } s4; typedef struct { unsigned int s5a; unsigned int s5b; } s5; typedef struct { unsigned int s6a; unsigned short s6b; unsigned short s6c; unsigned int s6d; unsigned int s6e; } s6; typedef struct { unsigned long long s7a; unsigned int s7b; } s7; char buffer[1024]; void f1 (void *x, int y, unsigned int z) { } static inline const unsigned int f2 (unsigned int x) { asm("" : "=r" (x) : "0" (x)); return x; } int f3 (s2 *x, void *y, long long z, s1 **p) { return 0; } void *f4(s2 *x, unsigned int y) { return 0; } int f5(void *x) { return 0; } int f6(s2 *x, s1 *y) { static int i; return i++; } s1 *foo (void *x, long long y, unsigned int z, int v) { static s1 a; static int b; if (v != 0x2204 || x != buffer) abort (); if (y != 0x2c5e780000000200LL + b) abort (); b += 0x200; a.s1a = (char *) x + 32; return &a; } int test (s2 *x, s7 *y) { s1 *v1; s3 *v3; s4 *v4; s5 *v5; s6 *v6; unsigned int a, b, c; int d, e, f, g, h, i; unsigned int j, k; unsigned long long l, m, n; void *o; l = y->s7a; h = y->s7b; f = h - x->s2f; g = f3 (x, x->s2g, (l << x->s2h) - 1, &v1); if (g) return g; m = l; j = m + 1; m = y->s7a - x->s2b; k = x->s2d; o = f4 (x, 14); g = f5 (o); if (g) return g; i = ((unsigned long long) x->s2e + 511) >> 9; d = x->s2a; n = 0; for (a = j - 1; a >= k; a--, m -= b) { v1 = foo (x->s2g->s1a, ((((unsigned long long) a * x->s2c) << x->s2h) + 1LL) << 9, i << 9, 0x2204); v3 = (s3 *)v1->s1a; f1 (v3, 0, x->s2e); v3->s3a = 0x58414746; v3->s3a = f2 (v3->s3a); v3->s3b = 1; v3->s3b = f2 (v3->s3b); v3->s3c = a; v3->s3c = f2 (v3->s3c); if (a == j - 1) b = l - a * (unsigned long long) x->s2c; else b = x->s2c; v3->s3d = b; v3->s3d = f2 (v3->s3d); v3->s3e[0] = (3LL >> x->s2h) + 1; v3->s3e[0] = f2 (v3->s3e[0]); v3->s3e[1] = (3LL >> x->s2h) + 2; v3->s3e[1] = f2 (v3->s3e[1]); v3->s3f[0] = 1; v3->s3f[0] = f2 (v3->s3f[0]); v3->s3f[1] = 1; v3->s3f[1] = f2 (v3->s3f[1]); v3->s3g = 0; v3->s3g = f2 (v3->s3g); v3->s3h = 512 / sizeof (unsigned int) - 1; v3->s3h = f2 (v3->s3h); v3->s3i = 0; v3->s3i = f2 (v3->s3i); c = b - ((unsigned int) (3LL >> x->s2h)) - 4; v3->s3j = c; v3->s3j = f2 (v3->s3j); v3->s3k = c; v3->s3k = f2 (v3->s3k); g = f6 (x, v1); if (g) goto l0; v1 = foo (x->s2g->s1a, ((((unsigned long long) a * x->s2c) << x->s2h) + 2LL) << 9, i << 9, 0x2204); v4 = (s4 *)v1->s1a; f1 (v4, 0, x->s2e); v4->s4a = 0x58414749; v4->s4a = f2 (v4->s4a); v4->s4b = 1; v4->s4b = f2 (v4->s4b); v4->s4c = a; v4->s4c = f2 (v4->s4c); v4->s4d = b; v4->s4d = f2 (v4->s4d); v4->s4e = 0; v4->s4e = f2 (v4->s4e); v4->s4f = (3LL >> x->s2h) + 3; v4->s4f = f2 (v4->s4f); v4->s4g = 1; v4->s4g = f2 (v4->s4g); v4->s4h = 0; v4->s4h = f2 (v4->s4h); v4->s4i = -1; v4->s4i = f2 (v4->s4i); v4->s4j = -1; v4->s4j = f2 (v4->s4j); for (e = 0; e < 64; e++) { v4->s4k[e] = -1; v4->s4k[e] = f2 (v4->s4k[e]); } g = f6 (x, v1); if (g) goto l0; v1 = foo (x->s2g->s1a, (((unsigned long long) a * x->s2c + (unsigned int) ((3LL >> x->s2h) + 1)) << x->s2h) << 9, (((unsigned long long) d + 511) >> 9) << 9, 0x2204); v6 = (s6 *)v1->s1a; f1 (v6, 0, d); v6->s6a = 0x41425442; v6->s6a = f2 (v6->s6a); v6->s6b = 0; v6->s6b = f2 (v6->s6b); v6->s6c = 1; v6->s6c = f2 (v6->s6c); v6->s6d = -1; v6->s6d = f2 (v6->s6d); v6->s6e = -1; v6->s6e = f2 (v6->s6e); v5 = (s5 *) ((char *) v6 + 24); v5->s5a = (3LL >> x->s2h) + 4; v5->s5a = f2 (v5->s5a); v5->s5b = b - v5->s5a; g = f6 (x, v1); if (g) goto l0; } return 0; l0: return g; } s2 a; s7 b; s1 c; int main (void) { b.s7a = 66; a.s2c = 0xabcdef; a.s2g = &c; a.s2h = 33; c.s1a = buffer; if (test (&a, &b) != 1) abort (); exit (0); } . It is some reload problem where reload first decides to put a DImode pseudo into %eax/%edx pair but later forgets about it and uses a stack slot for it.
Should be fixed in gcc-2.96-100 and above.
Tested it on my original code/problem and a recent kernel/etc. and all seems good now. You might want to release an updated gcc for 7.1 and 7.2.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2002-055.html