Bug 113148 - SMP kernel deadlock (?) as NFS client
SMP kernel deadlock (?) as NFS client
Status: CLOSED DUPLICATE of bug 109497
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
1
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-01-08 16:58 EST by ncb
Modified: 2015-01-04 17:04 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 14:00:39 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description ncb 2004-01-08 16:58:03 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/106.2 (KHTML, like Gecko) Safari/100.1

Description of problem:
After a few minutes of NFS client activity, the system deadlocks.  Only responce from the system is via ping.  (no vga, mouse, CTRL-ALT-DEL, etc.)  Must power cycle system to reboot.

Reproducable on uniprocessor kernel also, just takes longer.

Version-Release number of selected component (if applicable):
errata kernels 2135, 2138, 2140

How reproducible:
Always

Steps to Reproduce:
1. mount server:/somefs /mnt/nfs
2. tar -cvf /dev/null /mnt/nfs
3. wait a few minutes
    

Actual Results:  system deadlocks

Expected Results:  system should complete the copy, and continue to live.

Additional info:

current FC1 load, with all current (as of 8 Jan 04) applied, including kernel-2.4.22-1.2140)

Machines tested include a Dell PowerEdge Workstaion 330 (uniprocessor xeon) and IBM Intellistaion Z Pro (uniprocessor xeon w/ hyperthreading)

Suggestions for better methods of data collection welcome, I can't get an OOPS since the screen blanks...  :/
Comment 1 Wade Hampton 2004-01-12 10:18:43 EST
I have a dual 2.2 XEON (hyperthread disabled) with FC1.0, kernel 2115.
 It worked for over 25 days with NFS and SMB.  It finally hung last
night after I disconnected the cable to the NFS server and did a DF. 
Message was posted to the Fedora mailing list.  I am upgrading to 2140
kernel and will re-test (and will use Stress).  Please CC me on
resolution.
Comment 2 Dave Jones 2004-01-14 10:17:11 EST
ncb@0cc.gatech.edu : from reading your report it sounds like 2115 was
ok, and this is a regression in the errata kernels. Is that correct ?

Wade, any luck with the errata kernel ?

It's possible there are two bugs here. There are a number of other
similar bugs with SMP deadlocks.
Comment 3 ncb 2004-01-14 10:32:08 EST
Not sure if 2115 was good or not.  Will try that, and vanilla 2.4.22.

Should I build 2.4.22 with gcc from gcc-3.3.2-1.i386.rpm or from gcc32-3.2.3
-6.i386.rpm?


Another data point.  It may be during unmount that hangs occur - if that happens to 
tickle any neurons...
Comment 4 Dave Jones 2004-01-14 10:56:00 EST
use gcc32 for kernel builds. Though you could just grab the binaries
in this case..

Sounds like this may be another instance of bug #109497
Comment 5 Wade Hampton 2004-01-14 13:25:08 EST
I built a 2.4.24 kernel (but used stock gcc 3.3).  It has been up
since yesterday, but not heavily loaded.  Should I rebuild using
gcc32?  I assume this is simply editing 2.4.24/Makefile and changing
HOSTCC = gcc32, correct?

Anything I should turn off in BIOS (e.g., USB, ACPI, Hyperthread)?

If there are other SMP deadlocks in the current "stable" kernel, any
idea what they are and the schedule to resolve.  Also, I assume I
should enable nmi_watchdog=1 and sysreq?  

Also:  I would recommend this bug be marked as a duplicate of bug
109497 as that bug thread discusses similar problems (I posted there
as well).

BTW, this reminds me of the SMP problems I had in the 2.2.13-15
timeframe....
Comment 6 Dave Jones 2004-01-14 13:58:11 EST

*** This bug has been marked as a duplicate of 109497 ***
Comment 7 Red Hat Bugzilla 2006-02-21 14:00:39 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.