Bug 487026

Summary: GFS: gfs_grow causes lock_dlm: exxonfs: gdlm_lock 2,17 err=-16
Product: [Retired] Red Hat Cluster Suite Reporter: Nate Straz <nstraz>
Component: GFS-kernelAssignee: David Teigland <teigland>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 4CC: edamato, mkarg, rpeterso, tao, teigland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 438268 Environment:
Last Closed: 2009-05-18 21:10:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 438268    
Bug Blocks:    

Description Nate Straz 2009-02-23 18:26:22 UTC
+++ This bug was initially created as a clone of Bug #438268 +++

While running gfs_grow tests on 4.8 I started seeing these messages:

dlm: grow1: process_lockqueue_reply id 10371 state 0
dlm: grow1: process_lockqueue_reply id 302b4 state 0
dlm: grow1: process_lockqueue_reply id 301f8 state 0
dlm: grow1: process_lockqueue_reply id 5011e state 0
dlm: grow1: process_lockqueue_reply id 40195 state 0
lock_dlm: lm_dlm_cancel 2,4b flags 80
dlm: grow1: (10920) dlm_unlock: a590313 busy 2
lock_dlm: lm_dlm_cancel rv -16 2,4b flags 40080
lock_dlm: lm_dlm_cancel 2,4b flags 80

Which eventually turned to:

dlm: grow1: cancel reply ret 0
dlm: grow1: process_lockqueue_reply id a590313 state 0
Unable to handle kernel paging request at virtual address 00100100
 printing eip:
829e1a1a
*pde = 00004001
Oops: 0000 [#1]
SMP 
Modules linked in: lock_dlm(U) dm_cmirror(U) gnbd(U) lock_nolock(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc cpufreq_powersave loop button battery ac uhci_hcd hw_random e1000 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 ata_piix libata qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    1
EIP:    0060:[<829e1a1a>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9-80.ELhugemem) 
EIP is at process_lockqueue+0xd4/0x122 [dlm]
eax: 00000542   ebx: 001000d0   ecx: 829fd8e8   edx: 04433200
esi: 829fd8e8   edi: 001000d0   ebp: 00000000   esp: 70b40fb0
ds: 007b   es: 007b   ss: 0068
Process dlm_astd (pid: 8794, threadinfo=70b40000 task=80dacb30)
Stack: 00000000 829fd8a8 00000000 00000000 829e1b61 829e1cc6 70b40000 74167ea4 
       0213414d fffffffc ffffffff ffffffff 021340da 00000000 00000000 00000000 
       021041f5 74167e9c 00000000 00000000 
Call Trace:
 [<829e1b61>] dlm_astd+0x0/0x1a9 [dlm]
 [<829e1cc6>] dlm_astd+0x165/0x1a9 [dlm]
 [<0213414d>] kthread+0x73/0x9b
 [<021340da>] kthread+0x0/0x9b
 [<021041f5>] kernel_thread_helper+0x5/0xb
Code: 47 44 00 00 31 c9 ba 6b 00 00 00 b8 20 26 9f 82 e8 9c f1 73 7f e8 58 23 8f 7f 89 f1 f0 ff 0d e8 d8 9f 82 0f 88 2e 05 00 00 89 fb <8b> 7f 30 8d 43 30 83 ef 30 e9 5d ff ff ff b9 e8 d8 9f 82 f0 ff 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

I talked with Bob and Dave about this and they thought it was the same as bug #438268 from RHEL5.

Package versions:
GFS-kernel-2.6.9-81.2.el4
kernel-2.6.9-80.EL
dlm-kernel-2.6.9-58.1.el4

Comment 1 Nate Straz 2009-02-23 18:35:41 UTC
How Reproducible:

Easily with the new growfs test which runs a lighter sequential I/O load while growing GFS file systems with a 1k block size.

Comment 2 Robert Peterson 2009-02-24 14:22:05 UTC
Dave did the fix for the original problem and POSTed it.  I'm
assuming he'll do the crosswrite to 4.x, so I'm reassigning to him.

Comment 3 David Teigland 2009-02-24 17:19:28 UTC
pushed to RHEL4 branch commit 5a6349be0bdba75d2b1cc90e5c5861d2661a6304

Comment 5 David Teigland 2009-04-15 19:34:52 UTC
*** Bug 495968 has been marked as a duplicate of this bug. ***

Comment 6 Nate Straz 2009-04-17 17:45:24 UTC
Verified against:

GFS-kernel-hugemem-2.6.9-84.2.el4.i686
dlm-kernel-hugemem-2.6.9-58.4.el4.i686
kernel-hugemem-2.6.9-87.EL.i686

Comment 8 errata-xmlrpc 2009-05-18 21:10:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1045.html