Red Hat Bugzilla – Bug 113148
SMP kernel deadlock (?) as NFS client
Last modified: 2015-01-04 17:04:16 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/106.2 (KHTML, like Gecko) Safari/100.1
Description of problem:
After a few minutes of NFS client activity, the system deadlocks. Only responce from the system is via ping. (no vga, mouse, CTRL-ALT-DEL, etc.) Must power cycle system to reboot.
Reproducable on uniprocessor kernel also, just takes longer.
Version-Release number of selected component (if applicable):
errata kernels 2135, 2138, 2140
Steps to Reproduce:
1. mount server:/somefs /mnt/nfs
2. tar -cvf /dev/null /mnt/nfs
3. wait a few minutes
Actual Results: system deadlocks
Expected Results: system should complete the copy, and continue to live.
current FC1 load, with all current (as of 8 Jan 04) applied, including kernel-2.4.22-1.2140)
Machines tested include a Dell PowerEdge Workstaion 330 (uniprocessor xeon) and IBM Intellistaion Z Pro (uniprocessor xeon w/ hyperthreading)
Suggestions for better methods of data collection welcome, I can't get an OOPS since the screen blanks... :/
I have a dual 2.2 XEON (hyperthread disabled) with FC1.0, kernel 2115.
It worked for over 25 days with NFS and SMB. It finally hung last
night after I disconnected the cable to the NFS server and did a DF.
Message was posted to the Fedora mailing list. I am upgrading to 2140
kernel and will re-test (and will use Stress). Please CC me on
email@example.com : from reading your report it sounds like 2115 was
ok, and this is a regression in the errata kernels. Is that correct ?
Wade, any luck with the errata kernel ?
It's possible there are two bugs here. There are a number of other
similar bugs with SMP deadlocks.
Not sure if 2115 was good or not. Will try that, and vanilla 2.4.22.
Should I build 2.4.22 with gcc from gcc-3.3.2-1.i386.rpm or from gcc32-3.2.3
Another data point. It may be during unmount that hangs occur - if that happens to
tickle any neurons...
use gcc32 for kernel builds. Though you could just grab the binaries
in this case..
Sounds like this may be another instance of bug #109497
I built a 2.4.24 kernel (but used stock gcc 3.3). It has been up
since yesterday, but not heavily loaded. Should I rebuild using
gcc32? I assume this is simply editing 2.4.24/Makefile and changing
HOSTCC = gcc32, correct?
Anything I should turn off in BIOS (e.g., USB, ACPI, Hyperthread)?
If there are other SMP deadlocks in the current "stable" kernel, any
idea what they are and the schedule to resolve. Also, I assume I
should enable nmi_watchdog=1 and sysreq?
Also: I would recommend this bug be marked as a duplicate of bug
109497 as that bug thread discusses similar problems (I posted there
BTW, this reminds me of the SMP problems I had in the 2.2.13-15
*** This bug has been marked as a duplicate of 109497 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.