Bug 144809

Summary: Kernel panic when calling dlm_ls_query_wait() with a bogus lksb
Product: [Retired] Red Hat Cluster Suite Reporter: Dean Jansa <djansa>
Component: dlmAssignee: Christine Caulfield <ccaulfie>
Status: CLOSED NEXTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-11-30 18:55:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dean Jansa 2005-01-11 17:43:09 UTC
Description of problem:

I made a programming mistake, forgot to set the lksb arg to
dlm_ls_query_wait() with valid info.  lksb contained whatever
garbage happened to be on the stack, calling dlm_ls_query_wait()
caused:

kernel BUG at include/asm/spinlock.h:199!
invalid operand: 0000 [#1]
SMP
Modules linked in: lock_dlm(U) dlm(U) cman(U) lock_harness(U) md5 ipv6
parport_pc lp parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd
ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc
sd_mod scsi_mod
CPU:    3
EIP:    0060:[<c02c4e7d>]    Tainted: GF     VLI
EFLAGS: 00010213   (2.6.9-5.ELsmp)
EIP is at _read_lock+0x9/0x1d
eax: f6263fb8   ebx: f7c86c00   ecx: 00000000   edx: 00d04ffc
esi: 00d04ffc   edi: 00063fb0   ebp: f6c73a10   esp: f6108f00
ds: 007b   es: 007b   ss: 0068
Process dlmlock (pid: 2826, threadinfo=f6108000 task=f6642d30)
Stack: f8a6d3c1 f71f8520 f6c6d988 ffffffea f8a7518b f6108f5c f7c86c00
00000000
       00000038 f6c6d980 f71f8520 f6c6d980 f71f8530 f6c73a10 f8a6bfa6
f71f8520
       f8a6b3a3 f6c6d980 f7d89c80 f6c73a00 f7d89c80 bfee6ad0 00000068
f8a6c4cc
Call Trace:
 [<f8a6d3c1>] find_lock_by_id+0x1a/0x36 [dlm]
 [<f8a7518b>] dlm_query+0x63/0x24d [dlm]
 [<f8a6bfa6>] do_user_query+0x122/0x13e [dlm]
 [<f8a6b3a3>] ast_routine+0x0/0x130 [dlm]
 [<f8a6c4cc>] dlm_write+0x143/0x1ae [dlm]
 [<c01556ec>] vfs_write+0xb6/0xe2
 [<c01557b6>] sys_write+0x3c/0x62
 [<c02c62a3>] syscall_call+0x7/0xb
Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b cf 00 6c 74 2d c0 f0 81
28 00 00 00 01 74 05 e8 69 ee ff ff c3 81 78 04 ed 1e af de 74 08 <0f>
0b c7 00 6c 74 2d c0 f0 83 28 01 79 05 e8 6c ee ff ff c3 81
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception



Version-Release number of selected component (if applicable):

Kernel 2.6.9-5.ELsmp on an i686
DLM <CVS> (built Jan 10 2005 16:19:02) installed


How reproducible:

Every Time

Steps to Reproduce:
1. call dlm_ls_query_wait() with a bogus lksb.


Expected Results:

dlm_ls_query_wait() return an error or maybe even nonsense results if
by some chance the dlm_lksb had some valid info, but not tip over the
node.

Comment 1 Christine Caulfield 2005-01-18 10:47:55 UTC
hah, the check for a valid lock ID was rather late in the function
call, we attempted to get a spinlock before the check!

Checking in lkb.c;
/cvs/cluster/cluster/dlm-kernel/src/lkb.c,v  <--  lkb.c
new revision: 1.4; previous revision: 1.3
done
Checking in lkb.c;
/cvs/cluster/cluster/dlm-kernel/src/lkb.c,v  <--  lkb.c
new revision: 1.3.2.1; previous revision: 1.3
done


Comment 2 Dean Jansa 2005-11-30 18:55:19 UTC
Have not seen this since the fix.