166345 – HA NFS Cluster Problem

Bug 166345 - HA NFS Cluster Problem

Summary: HA NFS Cluster Problem

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	168424
TreeView+	depends on / blocked

Reported:	2005-08-19 16:01 UTC by Issue Tracker
Modified:	2007-11-30 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:	RHSA-2006-0144
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-03-15 16:25:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Upstream patch that fixes deadlock in lockd (1.86 KB, patch) 2005-08-23 21:01 UTC, Steve Dickson	no flags	Details \| Diff
Updated Patch (1.27 KB, patch) 2005-08-27 13:44 UTC, Steve Dickson	no flags	Details \| Diff
Updated Patch (1.69 KB, patch) 2005-08-30 12:36 UTC, Steve Dickson	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2006:0144	0	qe-ready	SHIPPED_LIVE	Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7	2006-03-15 05:00:00 UTC

Description Issue Tracker 2005-08-19 16:01:11 UTC

Escalated to Bugzilla from IssueTracker

Comment 12 Wendy Cheng 2005-08-23 19:42:15 UTC

This issue has been boiled down to the following:
                                                                                
I. The problem starts with AIX (NFS) clients doing heavy NFS IOs that would
bring down the NFS server *interface* - i.e. the server is still responsive and
accessing the filesystem locally on the server works fine but NFS exports no
longer accessible. Based on the customer, it could be recreated at will in their
environment.

II. From sysrq-m output taken during system fault, I don't see any memory issue
with the "down" server. 

III. From sysrq-t, three things to watch out:
                                                                                
III-1: This box has IBM multi-path driver (mpp) - I would need IBM support to
help us explaining the mpp threads trace back (are they in normal wait-for-work
path or in a fault handling path ?). At this moment, I assume they are in a
normal wait-for-work path.

Aug 22 13:30:20 fdxfs02 kernel: mppFailback   S 00000001  4820    31      1    
       32    30 (L-TLB)
Aug 22 13:30:20 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf6c09f50)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933234>] mppLnx_failback_sem [mpp_Vhba] 0x0
(0xf6c09f84)
Aug 22 13:30:20 fdxfs02 kernel: [<f893323c>] mppLnx_failback_sem [mpp_Vhba] 0x8
(0xf6c09f90)
Aug 22 13:30:20 fdxfs02 kernel: [<c010ae9a>] __down_interruptible [kernel] 0x8a
(0xf6c09f94)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933240>] mppLnx_failback_sem [mpp_Vhba] 0xc
(0xf6c09fa4)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933240>] mppLnx_failback_sem [mpp_Vhba] 0xc
(0xf6c09fa8)
Aug 22 13:30:20 fdxfs02 kernel: [<f8938750>] mppLnxFailbackScanContext
[mpp_Vhba] 0x10 (0xf6c09fb4)
Aug 22 13:30:20 fdxfs02 kernel: [<c010af67>] __down_failed_interruptible
[kernel] 0x7 (0xf6c09fcc)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933234>] mppLnx_failback_sem [mpp_Vhba] 0x0
(0xf6c09fd0)
Aug 22 13:30:20 fdxfs02 kernel: [<f892d639>] mppLnx_setCheckCondition [mpp_Vhba]
0x249 (0xf6c09fd8)
Aug 22 13:30:20 fdxfs02 kernel: [<f8938750>] mppLnxFailbackScanContext
[mpp_Vhba] 0x10 (0xf6c09fdc)
Aug 22 13:30:20 fdxfs02 kernel: [<f893039b>] .rodata.str1.1 [mpp_Vhba] 0x7c7
(0xf6c09fe0)
Aug 22 13:30:20 fdxfs02 kernel: [<f892c6a0>] mppLnx_failback_handler [mpp_Vhba]
0x0 (0xf6c09fe8)
Aug 22 13:30:20 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf6c09ff0)

III-2: All nfsds are hanging waiting for hash_lock and the while loop is
unbreakable. This piece of code certainly can get some improvements but I'm not
going to fuss about it at this moment. The real issue here is lockd hang (as
described in III-3). Since all nfsds hung at rexp_readlock(), no one can access
to this server.
                                                                                
void
exp_readlock(void)
{
      while (hash_lock || want_lock)
              sleep_on(&hash_wait);
      hash_count++;
}

Aug 22 13:30:25 fdxfs02 kernel: nfsd          D 00000000  3392  2484      1    
     2485  2483 (L-TLB)
Aug 22 13:30:25 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf6761f38)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f43040>] hash_wait [nfsd] 0x0 (0xf6761f6c)
Aug 22 13:30:25 fdxfs02 kernel: [<c01246e2>] sleep_on [kernel] 0x52 (0xf6761f7c)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f43040>] hash_wait [nfsd] 0x0 (0xf6761f9c)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f372fa>] exp_readlock [nfsd] 0x2a (0xf6761fac)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f2f3a4>] nfsd [nfsd] 0x1a4 (0xf6761fb0)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f2f200>] nfsd [nfsd] 0x0 (0xf6761fe0)
Aug 22 13:30:25 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf6761ff0)

III-3: The lockd hangs - look like deadlocking ! I havn't figured out which
semaphore it is waiting on and why.

Aug 22 13:30:23 fdxfs02 kernel: lockd         D 00000001  3872  2262      1    
     2284  2261 (L-TLB)
Aug 22 13:30:23 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf681ddc0)
Aug 22 13:30:23 fdxfs02 kernel: [<c010adb3>] __down [kernel] 0x73 (0xf681de04)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ecb5ab>] rpc_call_sync_Rsmp_c357b490
[sunrpc] 0xcb (0xf681de1c)
Aug 22 13:30:23 fdxfs02 kernel: [<c010af5c>] __down_failed [kernel] 0x8 (0xf681de38)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee5b7f>] .text.lock.svclock [lockd] 0x5
(0xf681de48)
Aug 22 13:30:23 fdxfs02 kernel: [<c029f267>] vsnprintf [kernel] 0x207 (0xf681de50)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ef03b8>] nlm_files [lockd] 0x18 (0xf681de58)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee7334>] nlm_traverse_files [lockd] 0x144
(0xf681de64)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee74c0>] nlmsvc_mark_resources [lockd] 0x20
(0xf681de84)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee3ff5>] nlm_gc_hosts [lockd] 0x45 (0xf681de90)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eec662>] .rodata.str1.1 [lockd] 0x39
(0xf681de98)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee396b>] nlm_lookup_host [lockd] 0x8b
(0xf681deb0)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eec657>] .rodata.str1.1 [lockd] 0x2e
(0xf681deb8)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ef0138>] nlm_hosts [lockd] 0x78 (0xf681decc)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee38d0>] nlmsvc_lookup_host [lockd] 0x30
(0xf681dee4)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee5a15>] nlmsvc_create_block [lockd] 0xb5
(0xf681def8)
Aug 22 13:30:23 fdxfs02 kernel: [<c0179f24>] posix_test_lock [kernel] 0x84
(0xf681df08)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee4e1a>] nlmsvc_lock [lockd] 0xca (0xf681df1c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eea4c7>] nlm4svc_retrieve_args [lockd] 0xc7
(0xf681df38)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eef2b8>] nlmsvc_version4 [lockd] 0x0
(0xf681df5c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eea6dc>] nlm4svc_proc_lock [lockd] 0xac
(0xf681df60)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eefc88>] nlmsvc_procedures4 [lockd] 0x48
(0xf681df84)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ed3548>] svc_process_Rsmp_462cdaea [sunrpc]
0x318 (0xf681df8c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee43fb>] lockd [lockd] 0x1ab (0xf681dfc4)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee4250>] lockd [lockd] 0x0 (0xf681dfe0)
Aug 22 13:30:23 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf681dff0)

Comment 14 Steve Dickson 2005-08-23 21:01:13 UTC

Created attachment 118023 [details]
Upstream patch that fixes deadlock in lockd

Comment 15 Wendy Cheng 2005-08-24 14:28:48 UTC

Steve, I checked the patch and this is exactly one of the problems. Thanks.

Comment 18 Lon Hohberger 2005-08-26 16:17:45 UTC

The STONITH message appears because the customer is not using power switches. 
Hence, it completely disclaims all data integrity because it can't ensure that
the node has been cut off.

(Not a bug)

Kernel panics... def. a bug.

Comment 21 Steve Dickson 2005-08-27 13:44:22 UTC

Created attachment 118188 [details]
Updated Patch 

During our internal review process, a locking inconsistency was
found in the original patch. So please re-test with this updated patch,  thx...

Comment 24 Steve Dickson 2005-08-30 12:36:14 UTC

Created attachment 118248 [details]
Updated Patch

Again through our review process, it was deemed that 
extra locking around blocked locks are not needed since
the locking process is single thread. So those locks were
removed. Please test to ensure the removal of those locks
do not cause any regression...

Comment 33 Ernie Petrides 2005-10-06 19:55:17 UTC

A fix for this problem is queued for the next interim U7 build.

Comment 34 Ernie Petrides 2005-10-08 02:10:01 UTC

A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).

Comment 38 Red Hat Bugzilla 2006-03-15 16:25:19 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html

Note You need to log in before you can comment on or make changes to this bug.