From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.14; Mac_PowerPC) Description of problem: I have installed the latest gcc and glibc on all our systems. On a number of heavily network loaded systems i am seeing stack traces (do_IRQ) and the machine would crash randomly or worse, would just hang. after thinking this was a problem with network drivers and/or with entropy handling it appears to have been narrowed down to a bug in the compiler or glibc or something there when you build a custom kernel. i reverted to a binary kernel (2.4.18-27.7.xsmp) and the problem went away. of course this is not optimal since we have always been able to successfully compile custom kernels in the past and this also means we cannot add any patches or other changes as a custom kernel now fails.. i am about 90% sure it is related to the compiler but who knows.. Version-Release number of selected component (if applicable): gcc-2.96-113 How reproducible: Always Steps to Reproduce: 1. compile custom kernel and install 2. apply heavy network load 3. thud/hang Actual Results: stack traces in syslog and machine hangs Expected Results: machine shouldn't have been crashing Additional info: i have not had a chance to test this with gcc-112 but since that was the previous compiler version i used to compile the 2.4.18-18.7.x kernels which didn't exhibit this bug i am guessing that something in gcc- 113 (and it's associated software) is broken for kernel compiles. note this seems to also affect redhat 7.2 (at that compiler version..)
We see this bug too. For us, it shows up using NFS alone, or in combination with mvfs.o. The kernel stack traces occur when the system takes an IRQ while the current process has traversed five successive symlinks. The kernel refuses to service the IRQ because there is less than 1KiB of kernel stack left. Recompiling with gcc-2.96-112 fixes the problem. The latest errata kernel for 7.x is vulnerable to this issue because it was built with gcc-2.96-113
Created attachment 95269 [details] Perl script to deadlock kernels built with gcc-2.96-113 Run this script with '--jobs=40 --net' with cwd in an NFS mount that has rsize=wsize=16KiB or greater. Start 3-8 large scp transfers out from the box. Watch the console. If you have a serial console, the system will deadlock. If not, it may stay up, but you will still see the stack traces.
Jason, or someone, could you please change the summary of this bug to read something like "gcc-2.96-113 produces broken kernels (DO_IRQ kernel stack trace deadlocks)"? I might have found this bug three weeks ago when I started investigating if it had had a summmary like that. 8)
This bug lists gcc-2.96-112 as safe, but one of my 7.2 systems running the 2.4.18-27 kernel rebuilt with gcc-2.96-112 just died with the do_IRQ: stack overflow errors.
As a sanity check, what does 'strings /boot/vmlinux-2.4.18-27.7x | grep gcc' show? Also, do_IRQ refusing to service an interrupt because there's not enough space on the stack can happen for other reasons. Can you make the fault happen with the attached Perl script?
Red Hat today released kernel-2.4.20-27.7, which fixes bug #108092, one of the dependent bugs of this bug. 'strings /boot/vmlinux-2.4.20-27.7 | grep gcc' shows that the new kernel was built with gcc 2.96-126, which isn't released AFAIK. Here's hoping Red Hat releases this gcc version before end-of-life of 7.3. (Or, alternatively, *after* that date. 8)
On a hint supplied by Erling Jacobsen over on bug #108092, I applied gcc-strict-alias-optimization2.patch from the RHEL 2.1 gcc-2.96-124 source RPM to the 7.3 gcc-2.96-113 sources. The patch applied cleanly, and a kernel built with the resulting gcc failed to crash when subjected to a variant of my kernel stack crash script. It appears this patch addresses the problem this bug refers to. Knowing this is not as helpful as it could be however, because the effect of applying this patch in isolation isn't known.
Yes, the networking code is known to have strict-aliasing violations. The kernel makefiles have been updated since then to always supply -fno-strict-aliasing.