Bug 108092 - 2.4.20-20.7 has kernel stack trace deadlocks - gcc-2.96-113
2.4.20-20.7 has kernel stack trace deadlocks - gcc-2.96-113
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.3
i686 Linux
high Severity medium
: ---
: ---
Assigned To: Dave Jones
Brian Brock
: Security
: 91566 (view as bug list)
Depends On: 87659
Blocks:
  Show dependency treegraph
 
Reported: 2003-10-27 11:39 EST by Howard Owen
Modified: 2015-01-04 17:03 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-01-04 22:22:07 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Script to deadlock kernels built with gcc-2.96-113 (3.50 KB, text/plain)
2003-10-27 11:44 EST, Howard Owen
no flags Details
Sample console messages when kernel deadlocks. (7.86 KB, text/plain)
2003-10-27 11:44 EST, Howard Owen
no flags Details

  None (edit)
Description Howard Owen 2003-10-27 11:39:17 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686) Gecko/20030807 Galeon/1.3.5

Description of problem:
Kernels built with gcc-2.96-113 are vulnerable to a deadlock when the current
process has traversed 5 successive symlinks on an NFS file system, and the
kernel takes an IRQ. There being less than 1KiB remainingon the process's kernel
stack, the service routine prints a stack trace rather than servicing the
interrupt. If the systemconsoleis on a serial port, this results in more IRQs
from the UART, which results in a deadlock.

Somehow, gcc-2.96-113 makes this condition far more likely. Kernels built with
gcc-2.96-112 or gcc3 do not exhibit the problem. Thedefault rsize/wsize of 4KiB
makes he problem less likely to occur also. With rsize=wsize=16KiB, the problem
shows up reliably.

Version-Release number of selected component (if applicable):
kernel-2.4.20-20.7 

How reproducible:
Always

Steps to Reproduce:
1. On  an NFS file system with rsize-wsize=16KiB, run the attached perl script
with --jobs=30 --net.
2. Start three to eight large transfers off the box using scp
3. Watch the serial console
    

Actual Results:  Kernel stack trace messages appear on the serial console and
the system deadlocks

Expected Results:  Load should go up to above 30. The large copies should run to
completion. The system should stay up.

Additional info: Sample stack trace
Comment 1 Howard Owen 2003-10-27 11:44:04 EST
Created attachment 95517 [details]
Script to deadlock kernels built with gcc-2.96-113

With cwd in a NFS mount with rsize=wsize=16KiB, run this script with --jobs=30
--net

Start 3-8 large file transfers off the box. (I use scp with a 500MiB file.)
Watch the serial console. The system will deadlock before the transferrs
complete.
Comment 2 Howard Owen 2003-10-27 11:44:53 EST
Created attachment 95518 [details]
Sample console messages when kernel deadlocks.
Comment 3 Howard Owen 2003-10-27 11:50:08 EST
I placed the severity at "security" because this is essentially a
denial-of-service attack on the affected kernel. A local user with normal
privileges can deadlock the system.
Comment 4 Joshua Jensen 2003-12-23 14:21:10 EST
I noticed that a new kernel errata,
https://rhn.redhat.com/errata/RHBA-2003-394.html, mentions this bug...
but it doesn't say way.  Does this mean that the kernel *wasn't*
compiled with gcc-2.96-113?  If so, what version does Red Hat recommend?
Comment 5 Howard Owen 2003-12-23 14:33:21 EST
The new kernel was compiled with gcc-2.96-126, which is an unreleased
version. I haven't tested this kernel yet, but I will soon. Since they
call out this bug, I'm assuming the unreleased gcc addresses the issue.

But if you want to build your own kernel, the workaround of
downgrading to gcc-2.96-112 is the only solution I'm aware of. Perhaps
Red Hat will fix bug #87659 by releasing the updated gcc before 7.3
end-of-life next week. The fix for this bug is well over half a loaf,
however, since most installations won't be running custom kernels.
Comment 6 Erling Jacobsen 2003-12-29 17:10:11 EST
I haven't found a gcc-2.96-124 myself, but I _did_ find a SRPM of
gcc-2.96-124 as an update to the 2.1 enterprise version of RHL.
One of the changes from -113 to -124 is apparently a fix for some
"excessive stack usage caused by the -fno-strict-aliasing patch".
Doesn't that sound relevant ? I'm no expert, but I think it would
be interesting to take the relevant new patches from gcc-2.96-124
and stick them into gcc-2.96-113, rebuild gcc, and use that to rebuild
the kernel.
Comment 7 Howard Owen 2003-12-31 17:52:42 EST
The patch you are apparantly referring to:
gcc-strict-alias-optimization2.patch, seems to address the problem
when applied to the gcc-2.96-113 SRPM. At least, a variant of my crash
script doesn't crash a kernel built with the resulting gcc. The pach
applied cleanly and the gcc build went smoothly. However I'm not in a
position to judge if this patch, applied in isolation, is a good
general fix for production systems.

One for Progeny, I guess.
Comment 8 Dave Jones 2004-01-04 23:16:49 EST
*** Bug 91566 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.