87659 – gcc-2.96-113 produces broken kernels (creating DO_IRQ kernel stack trace deadlocks)

Bug 87659 - gcc-2.96-113 produces broken kernels (creating DO_IRQ kernel stack trace deadlocks)

Summary: gcc-2.96-113 produces broken kernels (creating DO_IRQ kernel stack trace dead...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	gcc
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	92002 108092
TreeView+	depends on / blocked

Reported:	2003-03-31 23:00 UTC by jason andrade
Modified:	2007-04-18 16:52 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-10-03 19:19:26 UTC
Embargoed:

Attachments	(Terms of Use)
Perl script to deadlock kernels built with gcc-2.96-113 (3.50 KB, text/plain) 2003-10-17 17:15 UTC, Howard Owen	no flags	Details
View All

Description jason andrade 2003-03-31 23:00:30 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.14; Mac_PowerPC)

Description of problem:
I have installed the latest gcc and glibc on all our systems.  On a number 
of heavily network loaded systems i am seeing stack traces (do_IRQ) and 
the machine would crash randomly or worse, would just hang.

after thinking this was a problem with network drivers and/or with entropy 
handling it appears to have been narrowed down to a bug in the compiler 
or glibc or something there when you build a custom kernel.

i reverted to a binary kernel (2.4.18-27.7.xsmp) and the problem went 
away.  

of course this is not optimal since we have always been able to 
successfully compile custom kernels in the past and this also means we 
cannot add any patches or other changes as a custom kernel now fails..

i am about 90% sure it is related to the compiler but who knows.. 

Version-Release number of selected component (if applicable):
gcc-2.96-113

How reproducible:
Always

Steps to Reproduce:
1. compile custom kernel and install
2. apply heavy network load
3. thud/hang
    

Actual Results:  stack traces in syslog and machine hangs

Expected Results:  machine shouldn't have been crashing

Additional info:

i have not had a chance to test this with gcc-112 but since that
was the previous compiler version i used to compile the 2.4.18-18.7.x 
kernels which didn't exhibit this bug i am guessing that something in gcc-
113 (and it's associated software) is broken for kernel compiles.

note this seems to also affect redhat 7.2 (at that compiler version..)

Comment 1 Howard Owen 2003-10-17 17:04:33 UTC

We see this bug too. For us, it shows up using NFS alone, or in combination with
mvfs.o. The kernel stack traces occur when the system takes an IRQ while the
current process has traversed five successive symlinks. The kernel refuses to
service the IRQ because there is less than 1KiB of kernel stack left.
Recompiling with gcc-2.96-112 fixes the problem.

The latest errata kernel for 7.x is vulnerable to this issue because it was
built with gcc-2.96-113

Comment 2 Howard Owen 2003-10-17 17:15:33 UTC

Created attachment 95269 [details]
Perl script to deadlock kernels built with gcc-2.96-113


Run this script with '--jobs=40 --net' with cwd in an NFS mount that has
rsize=wsize=16KiB or greater.  Start 3-8 large scp transfers out from the box.
Watch the console. If you have a serial console, the system will deadlock. If
not, it may stay up, but you will still see the stack traces.

Comment 3 Howard Owen 2003-10-17 22:29:19 UTC

Jason, or someone, could you please change the summary of this bug to read
something like "gcc-2.96-113 produces broken kernels (DO_IRQ kernel stack trace
deadlocks)"? I might have found this bug three weeks ago when I started
investigating if it had had a summmary like that. 8)

Comment 4 John R 2003-11-10 08:17:24 UTC

This bug lists gcc-2.96-112 as safe, but one of my 7.2 systems running
the 2.4.18-27 kernel rebuilt with gcc-2.96-112 just died with the
do_IRQ: stack overflow errors.

Comment 5 Howard Owen 2003-11-10 18:43:37 UTC

As a sanity check, what does 'strings /boot/vmlinux-2.4.18-27.7x |
grep gcc' show?

Also, do_IRQ refusing to service an interrupt because there's not
enough space on the stack can happen for other reasons. Can you make
the fault happen with the attached Perl script?

Comment 6 Howard Owen 2003-12-23 20:57:16 UTC

Red Hat today released kernel-2.4.20-27.7, which fixes bug #108092,
one of the dependent bugs of this bug. 'strings
/boot/vmlinux-2.4.20-27.7 | grep gcc' shows that the new kernel was
built with gcc 2.96-126, which isn't released AFAIK. Here's hoping Red
Hat releases this gcc version before end-of-life of 7.3. (Or,
alternatively, *after* that date. 8)

Comment 7 Howard Owen 2003-12-31 22:59:17 UTC

On a hint supplied by Erling Jacobsen over on bug #108092, I applied
gcc-strict-alias-optimization2.patch from the RHEL 2.1 gcc-2.96-124
source RPM to the 7.3 gcc-2.96-113 sources. The patch applied cleanly,
and a kernel built with the resulting gcc failed to crash when
subjected to a variant of my kernel stack crash script.

It appears this patch addresses the problem this bug refers to.
Knowing this is not as helpful as it could be however, because the
effect of applying this patch in isolation isn't known.

Comment 8 Richard Henderson 2004-10-03 19:19:26 UTC

Yes, the networking code is known to have strict-aliasing violations.
The kernel makefiles have been updated since then to always supply
-fno-strict-aliasing.

Note You need to log in before you can comment on or make changes to this bug.