Bug 130665 - DLM: Assertion failed on line 973 of file /usr/src/cluster/dlm-kernel/src/lockqueue.c
Summary: DLM: Assertion failed on line 973 of file /usr/src/cluster/dlm-kernel/src/lo...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: David Teigland
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-08-23 16:12 UTC by Dean Jansa
Modified: 2009-04-16 20:29 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2005-01-05 15:40:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Dean Jansa 2004-08-23 16:12:44 UTC
Running:  
 
iogen -o -m random -s write,writev,readv -t 1b -T1000b 10000b:tfile1  
| doio -avk 
 
on a 6 node cluster produced: 
 
DLM:  Assertion failed on line 973 of file 
/usr/src/cluster/dlm-kernel/src/lockqueue.c 
DLM:  assertion:  "lkb" 
DLM:  time = 12776900 
dlm: reply 
rh_cmd 5 
rh_lkid 25403dc 
lockstate 3989 
nodeid 64 
status 0 
lkid f509be98 
nodeid 5 
 
------------[ cut here ]------------ 
kernel BUG at /usr/src/cluster/dlm-kernel/src/lockqueue.c:973! 
invalid operand: 0000 [#1] 
SMP 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac 
ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8a8a685>]    Not tainted 
EFLAGS: 00010246   (2.6.7) 
EIP is at process_cluster_request+0x7e5/0xd30 [dlm] 
eax: 00000001   ebx: 00000000   ecx: f7473e4c   edx: 0000872a 
esi: c237f800   edi: f7473f04   ebp: 00000000   esp: f7473e48 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3684, threadinfo=f7472000 task=f62f5160) 
Stack: f8a97e00 00000005 f8a98f8c f8a97dfc 00c2f5c4 f63b2c90 
0000003c f63b2b6c 
       00000000 00000000 f5cc1cc4 00000005 c035afc0 f7473fa4 
f7473fa4 c02d0128 
       00000fc4 00000040 00004000 f7473e98 00000000 c035ca80 
00001000 f625b500 
Call Trace: 
 [<c02d0128>] inet_recvmsg+0x48/0x70 
 [<c0285f1c>] sock_recvmsg+0xbc/0xc0 
 [<f8a8e8c3>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c0285f1c>] sock_recvmsg+0xbc/0xc0 
 [<f8a8c202>] receive_from_sock+0x142/0x320 [dlm] 
 [<f8a8d239>] process_sockets+0xa9/0xd0 [dlm] 
 [<f8a8d52d>] dlm_recvd+0x9d/0xf0 [dlm] 
 [<f8a8d490>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c01042b5>] kernel_thread_helper+0x5/0x10 
 
Code: 0f 0b cd 03 8c 8f a9 f8 e9 e8 fa ff ff 8b 57 0c 89 f0 e8 14 
 
 
Version-Release number of selected component (if applicable): 
DLM <CVS> (built Aug 20 2004 13:05:57) installed  
 
How reproducible: 
Didn't try 
 
Steps to Reproduce: 
1. iogen -o -m random -s write,writev,readv -t 1b -T1000b 
10000b:tfile1  | doio -avk on all 6 nodes of a six node cluster. 
2. Wait several hours...  
3. 
     
 
Additional info: 
 
This was hit while attempting to verify 126757.

Comment 1 David Teigland 2004-08-25 06:25:38 UTC
I ran this for about 24 hours on 8 nodes without a problem.  I have
4 SMP machines I can also try.

Comment 2 Dean Jansa 2004-08-25 13:40:42 UTC
I'm sorry, I should have noted that this was on SMP. 
 

Comment 3 David Teigland 2004-09-15 14:19:12 UTC
I've been running this for 6 hours on my 4 SMP machines with no
problem.  I'll let it continue running.

Comment 4 David Teigland 2004-09-16 08:56:09 UTC
Over 24 hours on 4 SMP machines and nothing.  I'll let this
one sit until someone can reproduce it.

Comment 5 Kiersten (Kerri) Anderson 2004-11-04 15:17:26 UTC
Updates with the proper version and component name.

Comment 6 Dean Jansa 2005-01-05 15:40:58 UTC
Ran for 19 hours, and did not hit this again. 
 


Note You need to log in before you can comment on or make changes to this bug.