Description of problem: I made a programming mistake, forgot to set the lksb arg to dlm_ls_query_wait() with valid info. lksb contained whatever garbage happened to be on the stack, calling dlm_ls_query_wait() caused: kernel BUG at include/asm/spinlock.h:199! invalid operand: 0000 [#1] SMP Modules linked in: lock_dlm(U) dlm(U) cman(U) lock_harness(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 3 EIP: 0060:[<c02c4e7d>] Tainted: GF VLI EFLAGS: 00010213 (2.6.9-5.ELsmp) EIP is at _read_lock+0x9/0x1d eax: f6263fb8 ebx: f7c86c00 ecx: 00000000 edx: 00d04ffc esi: 00d04ffc edi: 00063fb0 ebp: f6c73a10 esp: f6108f00 ds: 007b es: 007b ss: 0068 Process dlmlock (pid: 2826, threadinfo=f6108000 task=f6642d30) Stack: f8a6d3c1 f71f8520 f6c6d988 ffffffea f8a7518b f6108f5c f7c86c00 00000000 00000038 f6c6d980 f71f8520 f6c6d980 f71f8530 f6c73a10 f8a6bfa6 f71f8520 f8a6b3a3 f6c6d980 f7d89c80 f6c73a00 f7d89c80 bfee6ad0 00000068 f8a6c4cc Call Trace: [<f8a6d3c1>] find_lock_by_id+0x1a/0x36 [dlm] [<f8a7518b>] dlm_query+0x63/0x24d [dlm] [<f8a6bfa6>] do_user_query+0x122/0x13e [dlm] [<f8a6b3a3>] ast_routine+0x0/0x130 [dlm] [<f8a6c4cc>] dlm_write+0x143/0x1ae [dlm] [<c01556ec>] vfs_write+0xb6/0xe2 [<c01557b6>] sys_write+0x3c/0x62 [<c02c62a3>] syscall_call+0x7/0xb Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b cf 00 6c 74 2d c0 f0 81 28 00 00 00 01 74 05 e8 69 ee ff ff c3 81 78 04 ed 1e af de 74 08 <0f> 0b c7 00 6c 74 2d c0 f0 83 28 01 79 05 e8 6c ee ff ff c3 81 <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): Kernel 2.6.9-5.ELsmp on an i686 DLM <CVS> (built Jan 10 2005 16:19:02) installed How reproducible: Every Time Steps to Reproduce: 1. call dlm_ls_query_wait() with a bogus lksb. Expected Results: dlm_ls_query_wait() return an error or maybe even nonsense results if by some chance the dlm_lksb had some valid info, but not tip over the node.
hah, the check for a valid lock ID was rather late in the function call, we attempted to get a spinlock before the check! Checking in lkb.c; /cvs/cluster/cluster/dlm-kernel/src/lkb.c,v <-- lkb.c new revision: 1.4; previous revision: 1.3 done Checking in lkb.c; /cvs/cluster/cluster/dlm-kernel/src/lkb.c,v <-- lkb.c new revision: 1.3.2.1; previous revision: 1.3 done
Have not seen this since the fix.