Bug 144809

Summary:	Kernel panic when calling dlm_ls_query_wait() with a bogus lksb
Product:	[Retired] Red Hat Cluster Suite	Reporter:	Dean Jansa <djansa>
Component:	dlm	Assignee:	Christine Caulfield <ccaulfie>
Status:	CLOSED NEXTRELEASE	QA Contact:	Cluster QE <mspqa-list>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4	CC:	cluster-maint
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-11-30 18:55:19 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dean Jansa 2005-01-11 17:43:09 UTC

Description of problem:

I made a programming mistake, forgot to set the lksb arg to
dlm_ls_query_wait() with valid info.  lksb contained whatever
garbage happened to be on the stack, calling dlm_ls_query_wait()
caused:

kernel BUG at include/asm/spinlock.h:199!
invalid operand: 0000 [#1]
SMP
Modules linked in: lock_dlm(U) dlm(U) cman(U) lock_harness(U) md5 ipv6
parport_pc lp parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd
ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc
sd_mod scsi_mod
CPU:    3
EIP:    0060:[<c02c4e7d>]    Tainted: GF     VLI
EFLAGS: 00010213   (2.6.9-5.ELsmp)
EIP is at _read_lock+0x9/0x1d
eax: f6263fb8   ebx: f7c86c00   ecx: 00000000   edx: 00d04ffc
esi: 00d04ffc   edi: 00063fb0   ebp: f6c73a10   esp: f6108f00
ds: 007b   es: 007b   ss: 0068
Process dlmlock (pid: 2826, threadinfo=f6108000 task=f6642d30)
Stack: f8a6d3c1 f71f8520 f6c6d988 ffffffea f8a7518b f6108f5c f7c86c00
00000000
       00000038 f6c6d980 f71f8520 f6c6d980 f71f8530 f6c73a10 f8a6bfa6
f71f8520
       f8a6b3a3 f6c6d980 f7d89c80 f6c73a00 f7d89c80 bfee6ad0 00000068
f8a6c4cc
Call Trace:
 [<f8a6d3c1>] find_lock_by_id+0x1a/0x36 [dlm]
 [<f8a7518b>] dlm_query+0x63/0x24d [dlm]
 [<f8a6bfa6>] do_user_query+0x122/0x13e [dlm]
 [<f8a6b3a3>] ast_routine+0x0/0x130 [dlm]
 [<f8a6c4cc>] dlm_write+0x143/0x1ae [dlm]
 [<c01556ec>] vfs_write+0xb6/0xe2
 [<c01557b6>] sys_write+0x3c/0x62
 [<c02c62a3>] syscall_call+0x7/0xb
Code: 5b c3 81 78 04 ed 1e af de 74 08 0f 0b cf 00 6c 74 2d c0 f0 81
28 00 00 00 01 74 05 e8 69 ee ff ff c3 81 78 04 ed 1e af de 74 08 <0f>
0b c7 00 6c 74 2d c0 f0 83 28 01 79 05 e8 6c ee ff ff c3 81
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception



Version-Release number of selected component (if applicable):

Kernel 2.6.9-5.ELsmp on an i686
DLM <CVS> (built Jan 10 2005 16:19:02) installed


How reproducible:

Every Time

Steps to Reproduce:
1. call dlm_ls_query_wait() with a bogus lksb.


Expected Results:

dlm_ls_query_wait() return an error or maybe even nonsense results if
by some chance the dlm_lksb had some valid info, but not tip over the
node.

Comment 1 Christine Caulfield 2005-01-18 10:47:55 UTC

hah, the check for a valid lock ID was rather late in the function
call, we attempted to get a spinlock before the check!

Checking in lkb.c;
/cvs/cluster/cluster/dlm-kernel/src/lkb.c,v  <--  lkb.c
new revision: 1.4; previous revision: 1.3
done
Checking in lkb.c;
/cvs/cluster/cluster/dlm-kernel/src/lkb.c,v  <--  lkb.c
new revision: 1.3.2.1; previous revision: 1.3
done

Comment 2 Dean Jansa 2005-11-30 18:55:19 UTC

Have not seen this since the fix.