Bug 145550 - NFS lockups/hangs on RHEL3 x86_64 U4
NFS lockups/hangs on RHEL3 x86_64 U4
Status: CLOSED DUPLICATE of bug 138182
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-19 12:15 EST by Greg Baker
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-04 16:18:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Output from SysRq (59.71 KB, text/plain)
2005-01-19 12:17 EST, Greg Baker
no flags Details
Output from SysRq attached (47.86 KB, text/plain)
2005-01-19 12:20 EST, Greg Baker
no flags Details
Output from SysRq attached: (47.86 KB, text/plain)
2005-01-19 12:22 EST, Greg Baker
no flags Details
Additional system information (12.50 KB, text/plain)
2005-01-19 12:23 EST, Greg Baker
no flags Details
Revertion of Sillydelete patch (2.16 KB, text/plain)
2005-01-24 09:45 EST, Steve Dickson
no flags Details
perl script & C++ combo to hang nfs, plus systrace (62.80 KB, text/plain)
2005-04-27 15:39 EDT, Bob Manson
no flags Details

  None (edit)
Description Greg Baker 2005-01-19 12:15:53 EST
Description of problem:

We have a group of 10 systems to distribute builds (gcc).  When these
systems are under heavy load, the builds (make process) hang in a
blocked, deadlocked state.

The NFS servers do not show excessive load or any problems.

NFS gcc source: NetApp Release 6.4.5P2
NFS build source: NetApp Release 6.5.2R1P9:

The 10 build clients are exactly the same hardware and configuration.
 I'll be attaching the following from a system that is exhibiting this
behavior:

# lspci -vv
# lsmod
# cat /proc/meminfo
# cat /proc/cpuinfo
# uname -a

And as much as I can capture from the console as possible.

Unfortunately, we cannot recreate the hang on-demand.

Please let me know if there is any other information I can provide
that would be helpful. 

Version-Release number of selected component (if applicable):

Redhat Enterprise Linux 3 Update 4 x86_64
Comment 1 Greg Baker 2005-01-19 12:17:32 EST
Created attachment 109975 [details]
Output from SysRq

Output from SysRq attached:

SysRq : Show CPUs
SysRq : Show State
SysRq : Show Memory
SysRq : Crashing the kernel by request
Comment 2 Greg Baker 2005-01-19 12:20:15 EST
Created attachment 109976 [details]
 Output from SysRq attached

SysRq : Show CPUs
SysRq : Show State
SysRq : Show Memory
Comment 3 Greg Baker 2005-01-19 12:22:03 EST
Created attachment 109977 [details]
 Output from SysRq attached:

SysRq : Show CPUs
SysRq : Show State
SysRq : Show Memory
Comment 4 Greg Baker 2005-01-19 12:23:12 EST
Created attachment 109978 [details]
Additional system information

 From the system "sif029"

# lspci -vv
# lsmod
# cat /proc/meminfo
# cat /proc/cpuinfo
# uname -a
Comment 5 James Bourne 2005-01-19 12:31:05 EST
Perhaps related, we have a system which uses qlogic FC HBAs to a
CX600.  Locally it works fine, over NFS using SMP the system crashes
(multiple crash dumps uploaded see Service Request 366184).

We found UP resolved the oops issue.  Current crash dump and logs are
included in that service request.  This is with all kernels up to and
including latest errata.  For the longest time it was that the server
would only hang but latest SMP kernels actually oops and produce a
netdump.
Comment 6 Greg Baker 2005-01-23 13:18:49 EST
Could this be related?

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=138182
Comment 7 Steve Dickson 2005-01-24 09:45:35 EST
Created attachment 110124 [details]
Revertion of Sillydelete patch 

Yes, it appears rpciod is hanging in the same places as
in bz# 138182. Please try the attached, which should
elimnate the hang.
Comment 8 Greg Baker 2005-01-27 11:43:06 EST
Two days, no hangs... looks good so far
Comment 9 Steve Dickson 2005-01-27 12:01:59 EST
Cool... thanks for the update!
Comment 10 Bob Manson 2005-04-27 15:39:02 EDT
Created attachment 113735 [details]
perl script & C++ combo to hang nfs, plus systrace

We have several rooms full of Dell Optiplex GX260's running RHEL3.  I grabbed
the latest kernel (2.4.21-27.0.4.ELsmp #1 SMP) and I can get nfs to hang
reliably by compiling the attached C++ and using the attached perl script to
run the C++ binary.  I've also attached a systrace of the hung machine.  Steve
Dickson's attachment above (id=110124) seems to fix the problem but I sort of
expected the patch to have been incorporated into this release.
Comment 12 Ernie Petrides 2005-10-04 16:18:22 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html


Please see bug 138182 comment #47 for more details.


*** This bug has been marked as a duplicate of 138182 ***

Note You need to log in before you can comment on or make changes to this bug.