Description of problem: Quite often when umounting a GFS filesystem on my 8 node cluster one node will oops as shown below. This hangs any further DLM operations in the cluster until that node is rebooted. Version-Release number of selected component (if applicable): How reproducible: Very easily on my 8-node i686 cluster Steps to Reproduce: 1. Mount a GFS filesystem on all 8 nodes 2. umount it on all 8 nodes 3. repeat until it oopses (this usually happens very quickly for me) Actual results: Oops on at least one node. Expected results: Clean umount on all nodes. Additional info: Oops text: BUG: unable to handle kernel paging request at virtual address db79d830 printing eip: c01f54d1 *pde = 0006e067 *pte = 1b79d000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: lock_nolock lock_dlm dlm gfs2 configfs sctp ipv6 dm_round_rob in iscsi_tcp libiscsi scsi_transport_iscsi dm_multipath CPU: 0 EIP: 0060:[<c01f54d1>] Not tainted VLI EFLAGS: 00010206 (2.6.19-rc3 #2) EIP is at kref_put+0x55/0x7c eax: db79d830 ebx: db79d830 ecx: c9014f40 edx: c8d25f60 esi: c01f4f7d edi: d4ea67dc ebp: d241ff14 esp: d241fefc ds: 007b es: 007b ss: 0068 Process dlm_controld (pid: 3871, ti=d241e000 task=d2bdd5b0 task.ti=d241e000) Stack: d2bdd5b0 bf9e2726 d241ff20 00000046 00000000 db79d818 d241ff24 c01f4939 db79d830 c01f4f7d d241ff3c c01914d1 db79d818 00000008 c8d37dc8 c9014f40 d241ff70 c015ad7c c8d37dc8 c9014f40 00000000 00000000 00000000 c8d25f60 Call Trace: [<c0103d85>] show_trace_log_lvl+0x26/0x3c [<c0103e38>] show_stack_log_lvl+0x9d/0xa5 [<c01041e8>] show_registers+0x1af/0x249 [<c01045a9>] die+0x1dd/0x2c6 [<c0111932>] do_page_fault+0x488/0x562 [<c03294b9>] error_code+0x39/0x40 [<c01f4939>] kobject_put+0x1f/0x21 [<c01914d1>] sysfs_release+0x31/0x7d [<c015ad7c>] __fput+0xdc/0x1a7 [<c015ae5e>] fput+0x17/0x19 [<c01586d7>] filp_close+0x61/0x6a [<c0158f0d>] sys_close+0x7c/0xb1 [<c0102ecd>] sysenter_past_esp+0x56/0x8d ======================= Code: 75 29 c7 44 24 0c 55 b9 39 c0 c7 44 24 08 35 00 00 00 c7 44 24 04 5e f0 34 c0 c7 04 24 ba c7 33 c0 e8 38 49 f2 ff e8 b0 e9 f0 ff <8b> 03 48 74 0c 90 ff 0b 0f 94 c0 31 d2 84 c0 74 0a 89 1c 24 ff EIP: [<c01f54d1>] kref_put+0x55/0x7c SS:ESP 0068:d241fefc
Created attachment 140135 [details] Patch to fix (this has gone upstream to Steve)
Devel ACK and posting for beta2 blocker status. Problem has not yet shown up in QE testing but will most likely impact the mount-stress tests. Patch has been posted to rhkernel-list and would like to consider it for the final beta2 kernel respin.
Moved to RHEL5 beta and dlm-kernel. Also provided pm_ack.
QE ack for RHEL5B2.
Changing component to kernel for patch tracking.
in kernel-2.6.18-1.2744.el5
Patch confirmed in 2.6.18-1.2747.el5.