Description of problem: I was working on a fencing problem when I discovered this kernel crash. Basically what happens is the kernel crashes into kdb with dlm problem. The steps to reproduce this are: service ccsd start service cman start service clvmd start service clvmd stop service cman stop Starting portlock: ip_tables: (C) 2000-2002 Netfilter core team [ OK ] CMAN <CVS> (built May 9 2007 14:54:51) installed CMAN: quorum regained, resuming activity DLM <CVS> (built May 9 2007 14:55:00) installed WARNING: dlm_emergency_shutdown WARNING: dlm_emergency_shutdown slab error in kmem_cache_destroy(): cache `dlm_lkb': Can't free all objects Call Trace: <ffffffff8016191f>{kmem_cache_destroy+202} <ffffffffa02440ac>{:dlm:dlm_memory_exit+37} <ffffffffa024bbf9>{:dlm:cleanup_module+23} <ffffffff8014dc54>{sys_delete_module+479} <ffffffff80110c61>{error_exit+0} <ffffffff801101c6>{system_call+126} CMAN <CVS> (built May 9 2007 14:54:51) installed SLAB: cache with size 232 has lost its name CMAN: quorum regained, resuming activity kmem_cache_create: duplicate cache dlm_lkb Kernel BUG at slab:1453 invalid operand: 0000 [1] SMP Entering kdb (current=0x00000100e1b477f0, pid 8489) on processor 0 Oops: <NULL> due to oops @ 0xffffffff801623b8 r15 = 0xffffffffa024dd47 r14 = 0x0000010000000000 r13 = 0x0000000000000000 r12 = 0xffffffff8048a0e0 rbp = 0x00000100e3cec880 rbx = 0x00000100e3cecb70 r11 = 0x0000000000000001 r10 = 0x0000000100000000 r9 = 0x00000100e3cecb70 r8 = 0xffffffff803e5ac8 rax = 0x000000000000002b rcx = 0xffffffff803e5ac8 rdx = 0xffffffff803e5ac8 rsi = 0x0000000000000246 rdi = 0xffffffff8048a0e0 orig_rax = 0xffffffffffffffff rip = 0xffffffff801623b8 cs = 0x0000000000000010 eflags = 0x0000000000010202 rsp = 0x00000100e15d7ec0 ss = 0x00000100e15d6000 ®s = 0x00000100e15d7e28 [0]kdb> [forced to `spy' mode by cwsupport] [0]kdb> bt Stack traceback for pid 8489 0x00000100e1b477f0 8489 8445 1 0 R 0x00000100e1b47bf0 *modprobe RSP RIP Function (args) 0x100e15d7ec0 0xffffffff801623b8 kmem_cache_create+0x532 0x100e15d7f38 0xffffffffa0243fb0 [dlm]dlm_memory_init+0x80 0x100e15d7f48 0xffffffffa025c01a [dlm]init_module+0x1a 0x100e15d7f58 0xffffffff8014f739 sys_init_module+0x116
Suggested fix is to do the following at the end of lockspace.c:release_lockspace() } } + spin_lock(&ls->ls_trash_spin); +printk("release_lockspace: %d on the trash list\n",ls->ls_trash_count); + if (ls->ls_trash_count) { + struct dlm_lkb *lkb1, *lkb2; + list_for_each_entry_safe(lkb1, lkb2, &ls->ls_trash_list, + lkb_idtbl_list) { + list_del(&lkb1->lkb_idtbl_list); + free_lkb(lkb1); + } + } + spin_unlock(&ls->ls_trash_spin); + astd_resume();
You're evidently running with a debugging patch that was used while working on bug 199673 (patch in comment 16 of that bug). That debugging patch is definately not suitable for general usage and appears to be the cause of your problems. You should remove that patch and update to the most recent version of the dlm.
Please reopen this bug if there's still a problem after removing the patch.