Bug 166345 - HA NFS Cluster Problem
HA NFS Cluster Problem
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
high Severity high
: ---
: ---
Assigned To: Steve Dickson
Cluster QE
:
Depends On:
Blocks: 168424
  Show dependency treegraph
 
Reported: 2005-08-19 12:01 EDT by Issue Tracker
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-15 11:25:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Upstream patch that fixes deadlock in lockd (1.86 KB, patch)
2005-08-23 17:01 EDT, Steve Dickson
no flags Details | Diff
Updated Patch (1.27 KB, patch)
2005-08-27 09:44 EDT, Steve Dickson
no flags Details | Diff
Updated Patch (1.69 KB, patch)
2005-08-30 08:36 EDT, Steve Dickson
no flags Details | Diff

  None (edit)
Description Issue Tracker 2005-08-19 12:01:11 EDT
Escalated to Bugzilla from IssueTracker
Comment 12 Wendy Cheng 2005-08-23 15:42:15 EDT
This issue has been boiled down to the following:
                                                                                
I. The problem starts with AIX (NFS) clients doing heavy NFS IOs that would
bring down the NFS server *interface* - i.e. the server is still responsive and
accessing the filesystem locally on the server works fine but NFS exports no
longer accessible. Based on the customer, it could be recreated at will in their
environment.

II. From sysrq-m output taken during system fault, I don't see any memory issue
with the "down" server. 

III. From sysrq-t, three things to watch out:
                                                                                
III-1: This box has IBM multi-path driver (mpp) - I would need IBM support to
help us explaining the mpp threads trace back (are they in normal wait-for-work
path or in a fault handling path ?). At this moment, I assume they are in a
normal wait-for-work path.

Aug 22 13:30:20 fdxfs02 kernel: mppFailback   S 00000001  4820    31      1    
       32    30 (L-TLB)
Aug 22 13:30:20 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf6c09f50)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933234>] mppLnx_failback_sem [mpp_Vhba] 0x0
(0xf6c09f84)
Aug 22 13:30:20 fdxfs02 kernel: [<f893323c>] mppLnx_failback_sem [mpp_Vhba] 0x8
(0xf6c09f90)
Aug 22 13:30:20 fdxfs02 kernel: [<c010ae9a>] __down_interruptible [kernel] 0x8a
(0xf6c09f94)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933240>] mppLnx_failback_sem [mpp_Vhba] 0xc
(0xf6c09fa4)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933240>] mppLnx_failback_sem [mpp_Vhba] 0xc
(0xf6c09fa8)
Aug 22 13:30:20 fdxfs02 kernel: [<f8938750>] mppLnxFailbackScanContext
[mpp_Vhba] 0x10 (0xf6c09fb4)
Aug 22 13:30:20 fdxfs02 kernel: [<c010af67>] __down_failed_interruptible
[kernel] 0x7 (0xf6c09fcc)
Aug 22 13:30:20 fdxfs02 kernel: [<f8933234>] mppLnx_failback_sem [mpp_Vhba] 0x0
(0xf6c09fd0)
Aug 22 13:30:20 fdxfs02 kernel: [<f892d639>] mppLnx_setCheckCondition [mpp_Vhba]
0x249 (0xf6c09fd8)
Aug 22 13:30:20 fdxfs02 kernel: [<f8938750>] mppLnxFailbackScanContext
[mpp_Vhba] 0x10 (0xf6c09fdc)
Aug 22 13:30:20 fdxfs02 kernel: [<f893039b>] .rodata.str1.1 [mpp_Vhba] 0x7c7
(0xf6c09fe0)
Aug 22 13:30:20 fdxfs02 kernel: [<f892c6a0>] mppLnx_failback_handler [mpp_Vhba]
0x0 (0xf6c09fe8)
Aug 22 13:30:20 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf6c09ff0)

III-2: All nfsds are hanging waiting for hash_lock and the while loop is
unbreakable. This piece of code certainly can get some improvements but I'm not
going to fuss about it at this moment. The real issue here is lockd hang (as
described in III-3). Since all nfsds hung at rexp_readlock(), no one can access
to this server.
                                                                                
void
exp_readlock(void)
{
      while (hash_lock || want_lock)
              sleep_on(&hash_wait);
      hash_count++;
}

Aug 22 13:30:25 fdxfs02 kernel: nfsd          D 00000000  3392  2484      1    
     2485  2483 (L-TLB)
Aug 22 13:30:25 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf6761f38)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f43040>] hash_wait [nfsd] 0x0 (0xf6761f6c)
Aug 22 13:30:25 fdxfs02 kernel: [<c01246e2>] sleep_on [kernel] 0x52 (0xf6761f7c)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f43040>] hash_wait [nfsd] 0x0 (0xf6761f9c)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f372fa>] exp_readlock [nfsd] 0x2a (0xf6761fac)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f2f3a4>] nfsd [nfsd] 0x1a4 (0xf6761fb0)
Aug 22 13:30:25 fdxfs02 kernel: [<f8f2f200>] nfsd [nfsd] 0x0 (0xf6761fe0)
Aug 22 13:30:25 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf6761ff0)

III-3: The lockd hangs - look like deadlocking ! I havn't figured out which
semaphore it is waiting on and why.

Aug 22 13:30:23 fdxfs02 kernel: lockd         D 00000001  3872  2262      1    
     2284  2261 (L-TLB)
Aug 22 13:30:23 fdxfs02 kernel: Call Trace:   [<c0123e24>] schedule [kernel]
0x2f4 (0xf681ddc0)
Aug 22 13:30:23 fdxfs02 kernel: [<c010adb3>] __down [kernel] 0x73 (0xf681de04)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ecb5ab>] rpc_call_sync_Rsmp_c357b490
[sunrpc] 0xcb (0xf681de1c)
Aug 22 13:30:23 fdxfs02 kernel: [<c010af5c>] __down_failed [kernel] 0x8 (0xf681de38)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee5b7f>] .text.lock.svclock [lockd] 0x5
(0xf681de48)
Aug 22 13:30:23 fdxfs02 kernel: [<c029f267>] vsnprintf [kernel] 0x207 (0xf681de50)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ef03b8>] nlm_files [lockd] 0x18 (0xf681de58)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee7334>] nlm_traverse_files [lockd] 0x144
(0xf681de64)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee74c0>] nlmsvc_mark_resources [lockd] 0x20
(0xf681de84)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee3ff5>] nlm_gc_hosts [lockd] 0x45 (0xf681de90)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eec662>] .rodata.str1.1 [lockd] 0x39
(0xf681de98)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee396b>] nlm_lookup_host [lockd] 0x8b
(0xf681deb0)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eec657>] .rodata.str1.1 [lockd] 0x2e
(0xf681deb8)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ef0138>] nlm_hosts [lockd] 0x78 (0xf681decc)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee38d0>] nlmsvc_lookup_host [lockd] 0x30
(0xf681dee4)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee5a15>] nlmsvc_create_block [lockd] 0xb5
(0xf681def8)
Aug 22 13:30:23 fdxfs02 kernel: [<c0179f24>] posix_test_lock [kernel] 0x84
(0xf681df08)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee4e1a>] nlmsvc_lock [lockd] 0xca (0xf681df1c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eea4c7>] nlm4svc_retrieve_args [lockd] 0xc7
(0xf681df38)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eef2b8>] nlmsvc_version4 [lockd] 0x0
(0xf681df5c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eea6dc>] nlm4svc_proc_lock [lockd] 0xac
(0xf681df60)
Aug 22 13:30:23 fdxfs02 kernel: [<f8eefc88>] nlmsvc_procedures4 [lockd] 0x48
(0xf681df84)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ed3548>] svc_process_Rsmp_462cdaea [sunrpc]
0x318 (0xf681df8c)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee43fb>] lockd [lockd] 0x1ab (0xf681dfc4)
Aug 22 13:30:23 fdxfs02 kernel: [<f8ee4250>] lockd [lockd] 0x0 (0xf681dfe0)
Aug 22 13:30:23 fdxfs02 kernel: [<c01095ad>] kernel_thread_helper [kernel] 0x5
(0xf681dff0)
Comment 14 Steve Dickson 2005-08-23 17:01:13 EDT
Created attachment 118023 [details]
Upstream patch that fixes deadlock in lockd
Comment 15 Wendy Cheng 2005-08-24 10:28:48 EDT
Steve, I checked the patch and this is exactly one of the problems. Thanks.
Comment 18 Lon Hohberger 2005-08-26 12:17:45 EDT
The STONITH message appears because the customer is not using power switches. 
Hence, it completely disclaims all data integrity because it can't ensure that
the node has been cut off.

(Not a bug)

Kernel panics... def. a bug.
Comment 21 Steve Dickson 2005-08-27 09:44:22 EDT
Created attachment 118188 [details]
Updated Patch 

During our internal review process, a locking inconsistency was
found in the original patch. So please re-test with this updated patch,  thx...
Comment 24 Steve Dickson 2005-08-30 08:36:14 EDT
Created attachment 118248 [details]
Updated Patch

Again through our review process, it was deemed that 
extra locking around blocked locks are not needed since
the locking process is single thread. So those locks were
removed. Please test to ensure the removal of those locks
do not cause any regression...
Comment 33 Ernie Petrides 2005-10-06 15:55:17 EDT
A fix for this problem is queued for the next interim U7 build.
Comment 34 Ernie Petrides 2005-10-07 22:10:01 EDT
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).
Comment 38 Red Hat Bugzilla 2006-03-15 11:25:19 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html

Note You need to log in before you can comment on or make changes to this bug.