Bug 145550 - NFS lockups/hangs on RHEL3 x86_64 U4
Summary: NFS lockups/hangs on RHEL3 x86_64 U4
Keywords:
Status: CLOSED DUPLICATE of bug 138182
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-19 17:15 UTC by Greg Baker
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-04 20:18:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Output from SysRq (59.71 KB, text/plain)
2005-01-19 17:17 UTC, Greg Baker
no flags Details
Output from SysRq attached (47.86 KB, text/plain)
2005-01-19 17:20 UTC, Greg Baker
no flags Details
Output from SysRq attached: (47.86 KB, text/plain)
2005-01-19 17:22 UTC, Greg Baker
no flags Details
Additional system information (12.50 KB, text/plain)
2005-01-19 17:23 UTC, Greg Baker
no flags Details
Revertion of Sillydelete patch (2.16 KB, text/plain)
2005-01-24 14:45 UTC, Steve Dickson
no flags Details
perl script & C++ combo to hang nfs, plus systrace (62.80 KB, text/plain)
2005-04-27 19:39 UTC, Bob Manson
no flags Details

Description Greg Baker 2005-01-19 17:15:53 UTC
Description of problem:

We have a group of 10 systems to distribute builds (gcc).  When these
systems are under heavy load, the builds (make process) hang in a
blocked, deadlocked state.

The NFS servers do not show excessive load or any problems.

NFS gcc source: NetApp Release 6.4.5P2
NFS build source: NetApp Release 6.5.2R1P9:

The 10 build clients are exactly the same hardware and configuration.
 I'll be attaching the following from a system that is exhibiting this
behavior:

# lspci -vv
# lsmod
# cat /proc/meminfo
# cat /proc/cpuinfo
# uname -a

And as much as I can capture from the console as possible.

Unfortunately, we cannot recreate the hang on-demand.

Please let me know if there is any other information I can provide
that would be helpful. 

Version-Release number of selected component (if applicable):

Redhat Enterprise Linux 3 Update 4 x86_64

Comment 1 Greg Baker 2005-01-19 17:17:32 UTC
Created attachment 109975 [details]
Output from SysRq

Output from SysRq attached:

SysRq : Show CPUs
SysRq : Show State
SysRq : Show Memory
SysRq : Crashing the kernel by request

Comment 2 Greg Baker 2005-01-19 17:20:15 UTC
Created attachment 109976 [details]
 Output from SysRq attached

SysRq : Show CPUs
SysRq : Show State
SysRq : Show Memory

Comment 3 Greg Baker 2005-01-19 17:22:03 UTC
Created attachment 109977 [details]
 Output from SysRq attached:

SysRq : Show CPUs
SysRq : Show State
SysRq : Show Memory

Comment 4 Greg Baker 2005-01-19 17:23:12 UTC
Created attachment 109978 [details]
Additional system information

 From the system "sif029"

# lspci -vv
# lsmod
# cat /proc/meminfo
# cat /proc/cpuinfo
# uname -a

Comment 5 James Bourne 2005-01-19 17:31:05 UTC
Perhaps related, we have a system which uses qlogic FC HBAs to a
CX600.  Locally it works fine, over NFS using SMP the system crashes
(multiple crash dumps uploaded see Service Request 366184).

We found UP resolved the oops issue.  Current crash dump and logs are
included in that service request.  This is with all kernels up to and
including latest errata.  For the longest time it was that the server
would only hang but latest SMP kernels actually oops and produce a
netdump.

Comment 6 Greg Baker 2005-01-23 18:18:49 UTC
Could this be related?

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=138182

Comment 7 Steve Dickson 2005-01-24 14:45:35 UTC
Created attachment 110124 [details]
Revertion of Sillydelete patch 

Yes, it appears rpciod is hanging in the same places as
in bz# 138182. Please try the attached, which should
elimnate the hang.

Comment 8 Greg Baker 2005-01-27 16:43:06 UTC
Two days, no hangs... looks good so far

Comment 9 Steve Dickson 2005-01-27 17:01:59 UTC
Cool... thanks for the update!

Comment 10 Bob Manson 2005-04-27 19:39:02 UTC
Created attachment 113735 [details]
perl script & C++ combo to hang nfs, plus systrace

We have several rooms full of Dell Optiplex GX260's running RHEL3.  I grabbed
the latest kernel (2.4.21-27.0.4.ELsmp #1 SMP) and I can get nfs to hang
reliably by compiling the attached C++ and using the attached perl script to
run the C++ binary.  I've also attached a systrace of the hung machine.  Steve
Dickson's attachment above (id=110124) seems to fix the problem but I sort of
expected the patch to have been incorporated into this release.

Comment 12 Ernie Petrides 2005-10-04 20:18:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-294.html


Please see bug 138182 comment #47 for more details.


*** This bug has been marked as a duplicate of 138182 ***


Note You need to log in before you can comment on or make changes to this bug.