Bug 166345

Summary: HA NFS Cluster Problem
Product: Red Hat Enterprise Linux 3 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: kanderso, lwang, petrides, rkenna, steved, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0144 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-15 16:25:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 168424    
Attachments:
Description Flags
Upstream patch that fixes deadlock in lockd
none
Updated Patch
none
Updated Patch none

Description Issue Tracker 2005-08-19 16:01:11 UTC
Escalated to Bugzilla from IssueTracker

Comment 12 Wendy Cheng 2005-08-23 19:42:15 UTC
This issue has been boiled down to the following:
                                                                                
I. The problem starts with AIX (NFS) clients doing heavy NFS IOs that would
bring down the NFS server *interface* - i.e. the server is still responsive and
accessing the filesystem locally on the server works fine but NFS exports no
longer accessible. Based on the customer, it could be recreated at will in their
environment.

II. From sysrq-m output taken during system fault, I don't see any memory issue
with the "down" server. 

III. From sysrq-t, three things to watch out:
                                                                                
III-1: This box has IBM multi-path driver (mpp) - I would need IBM support to
help us explaining the mpp threads trace back (are they in normal wait-for-work
path or in a fault handling path ?). At this moment, I assume they are in a
normal wait-for-work path.

Aug 22 13:30:20 fdxfs02 kernel: mppFailback   S 00000001  4820    31      1    
       32    30 (L-TLB)
Aug 22 13:30:20 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf6c09f50)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933234>] mppLnx_failback_sem [mpp_Vhba] 0x0
(0xf6c09f84)
Aug 22 13:30:20 fdxfs02 kernel: [<f893323c>] mppLnx_failback_sem [mpp_Vhba] 0x8
(0xf6c09f90)
Aug 22 13:30:20 fdxfs02 kernel: [<c010ae9a>] __down_interruptible [kernel] 0x8a
(0xf6c09f94)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933240>] mppLnx_failback_sem [mpp_Vhba] 0xc
(0xf6c09fa4)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933240>] mppLnx_failback_sem [mpp_Vhba] 0xc
(0xf6c09fa8)
Aug 22 13:30:20 fdxfs02 kernel: [<f8938750>] mppLnxFailbackScanContext
[mpp_Vhba] 0x10 (0xf6c09fb4)
Aug 22 13:30:20 fdxfs02 kernel: [<c010af67>] __down_failed_interruptible
[kernel] 0x7 (0xf6c09fcc)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933234>] mppLnx_failback_sem [mpp_Vhba] 0x0
(0xf6c09fd0)
Aug 22 13:30:20 fdxfs02 kernel: [<f892d639>] mppLnx_setCheckCondition [mpp_Vhba]
0x249 (0xf6c09fd8)
Aug 22 13:30:20 fdxfs02 kernel: [<f8938750>] mppLnxFailbackScanContext
[mpp_Vhba] 0x10 (0xf6c09fdc)
Aug 22 13:30:20 fdxfs02 kernel: [<f893039b>] .rodata.str1.1 [mpp_Vhba] 0x7c7
(0xf6c09fe0)
Aug 22 13:30:20 fdxfs02 kernel: [<f892c6a0>] mppLnx_failback_handler [mpp_Vhba]
0x0 (0xf6c09fe8)
Aug 22 13:30:20 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf6c09ff0)

III-2: All nfsds are hanging waiting for hash_lock and the while loop is
unbreakable. This piece of code certainly can get some improvements but I'm not
going to fuss about it at this moment. The real issue here is lockd hang (as
described in III-3). Since all nfsds hung at rexp_readlock(), no one can access
to this server.
                                                                                
void
exp_readlock(void)
{
      while (hash_lock || want_lock)
              sleep_on(&hash_wait);
      hash_count++;
}

Aug 22 13:30:25 fdxfs02 kernel: nfsd          D 00000000  3392  2484      1    
     2485  2483 (L-TLB)
Aug 22 13:30:25 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf6761f38)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f43040>] hash_wait [nfsd] 0x0 (0xf6761f6c)
Aug 22 13:30:25 fdxfs02 kernel: [<c01246e2>] sleep_on [kernel] 0x52 (0xf6761f7c)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f43040>] hash_wait [nfsd] 0x0 (0xf6761f9c)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f372fa>] exp_readlock [nfsd] 0x2a (0xf6761fac)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f2f3a4>] nfsd [nfsd] 0x1a4 (0xf6761fb0)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f2f200>] nfsd [nfsd] 0x0 (0xf6761fe0)
Aug 22 13:30:25 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf6761ff0)

III-3: The lockd hangs - look like deadlocking ! I havn't figured out which
semaphore it is waiting on and why.

Aug 22 13:30:23 fdxfs02 kernel: lockd         D 00000001  3872  2262      1    
     2284  2261 (L-TLB)
Aug 22 13:30:23 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf681ddc0)
Aug 22 13:30:23 fdxfs02 kernel: [<c010adb3>] __down [kernel] 0x73 (0xf681de04)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ecb5ab>] rpc_call_sync_Rsmp_c357b490
[sunrpc] 0xcb (0xf681de1c)
Aug 22 13:30:23 fdxfs02 kernel: [<c010af5c>] __down_failed [kernel] 0x8 (0xf681de38)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee5b7f>] .text.lock.svclock [lockd] 0x5
(0xf681de48)
Aug 22 13:30:23 fdxfs02 kernel: [<c029f267>] vsnprintf [kernel] 0x207 (0xf681de50)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ef03b8>] nlm_files [lockd] 0x18 (0xf681de58)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee7334>] nlm_traverse_files [lockd] 0x144
(0xf681de64)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee74c0>] nlmsvc_mark_resources [lockd] 0x20
(0xf681de84)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee3ff5>] nlm_gc_hosts [lockd] 0x45 (0xf681de90)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eec662>] .rodata.str1.1 [lockd] 0x39
(0xf681de98)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee396b>] nlm_lookup_host [lockd] 0x8b
(0xf681deb0)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eec657>] .rodata.str1.1 [lockd] 0x2e
(0xf681deb8)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ef0138>] nlm_hosts [lockd] 0x78 (0xf681decc)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee38d0>] nlmsvc_lookup_host [lockd] 0x30
(0xf681dee4)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee5a15>] nlmsvc_create_block [lockd] 0xb5
(0xf681def8)
Aug 22 13:30:23 fdxfs02 kernel: [<c0179f24>] posix_test_lock [kernel] 0x84
(0xf681df08)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee4e1a>] nlmsvc_lock [lockd] 0xca (0xf681df1c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eea4c7>] nlm4svc_retrieve_args [lockd] 0xc7
(0xf681df38)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eef2b8>] nlmsvc_version4 [lockd] 0x0
(0xf681df5c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eea6dc>] nlm4svc_proc_lock [lockd] 0xac
(0xf681df60)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eefc88>] nlmsvc_procedures4 [lockd] 0x48
(0xf681df84)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ed3548>] svc_process_Rsmp_462cdaea [sunrpc]
0x318 (0xf681df8c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee43fb>] lockd [lockd] 0x1ab (0xf681dfc4)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee4250>] lockd [lockd] 0x0 (0xf681dfe0)
Aug 22 13:30:23 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf681dff0)

Comment 14 Steve Dickson 2005-08-23 21:01:13 UTC
Created attachment 118023 [details]
Upstream patch that fixes deadlock in lockd

Comment 15 Wendy Cheng 2005-08-24 14:28:48 UTC
Steve, I checked the patch and this is exactly one of the problems. Thanks.

Comment 18 Lon Hohberger 2005-08-26 16:17:45 UTC
The STONITH message appears because the customer is not using power switches. 
Hence, it completely disclaims all data integrity because it can't ensure that
the node has been cut off.

(Not a bug)

Kernel panics... def. a bug.

Comment 21 Steve Dickson 2005-08-27 13:44:22 UTC
Created attachment 118188 [details]
Updated Patch 

During our internal review process, a locking inconsistency was
found in the original patch. So please re-test with this updated patch,  thx...

Comment 24 Steve Dickson 2005-08-30 12:36:14 UTC
Created attachment 118248 [details]
Updated Patch

Again through our review process, it was deemed that 
extra locking around blocked locks are not needed since
the locking process is single thread. So those locks were
removed. Please test to ensure the removal of those locks
do not cause any regression...

Comment 33 Ernie Petrides 2005-10-06 19:55:17 UTC
A fix for this problem is queued for the next interim U7 build.


Comment 34 Ernie Petrides 2005-10-08 02:10:01 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).


Comment 38 Red Hat Bugzilla 2006-03-15 16:25:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html