Bug 128679 - DLM: Assertion failed on line 328 of file /usr/src/cluster/dlm-kernel/src/lockqueue.c
DLM: Assertion failed on line 328 of file /usr/src/cluster/dlm-kernel/src/lo...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
: 129151 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-27 17:20 EDT by Dean Jansa
Modified: 2010-01-11 21:55 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-08-19 16:34:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dean Jansa 2004-07-27 17:20:18 EDT
From Bugzilla Helper: 
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1; Linux) 
 
Description of problem: 
 
Random overlapping IO from multiple nodes cause assertion. 
 
On boron: 
 
 ./d_iogen -P boron -N gfs -R 
../var/share/resource_files/tank-cluster.xml -m random -s 
read,write,readv,writev -t 1b -T 100b -F 1000b:f1,1000b:f2,1000b:f3 
 
On tank-01 -- tank-06: 
 
d_doio -P boron -N gfs -a -k -m 100 -n 10 -r 100 
 
Tank-01 hit assertion: 
Jul 27 16:12:55 tank-01 kernel: ------------[ cut here ]------------ 
Jul 27 16:12:55 tank-01 kernel: kernel BUG at 
/usr/src/cluster/dlm-kernel/src/lockqueue.c:328! 
Jul 27 16:12:55 tank-01 kernel: invalid operand: 0000 [#1] 
Jul 27 16:12:55 tank-01 kernel: Modules linked in: gnbd lock_gulm 
lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp 
parport autofs4 sunrpc e1000 floppy sg microcode uhci_hcd ehci_hcd 
button battery asus_acpi ac ext3 jbd dm_mod qla2300 qla2xxx 
scsi_transport_fc sd_mod scsi_mod 
Jul 27 16:12:55 tank-01 kernel: CPU:    0 
Jul 27 16:12:55 tank-01 kernel: EIP:    0060:[<f8a76b75>]    Not 
tainted 
Jul 27 16:12:55 tank-01 kernel: EFLAGS: 00010286   (2.6.7) 
Jul 27 16:12:55 tank-01 kernel: EIP is at 
process_lockqueue_reply+0x565/0x690 [dlm] 
Jul 27 16:12:55 tank-01 kernel: eax: 00000001   ebx: 00000001   ecx: 
00000000   edx: f5805e0c 
Jul 27 16:12:55 tank-01 kernel: esi: f531ce40   edi: f5805ef4   ebp: 
f078a2a0   esp: f5805e08 
Jul 27 16:12:55 tank-01 kernel: ds: 007b   es: 007b   ss: 0068 
Jul 27 16:12:55 tank-01 kernel: Process dlm_recvd (pid: 4035, 
threadinfo=f5804000 task=f5809330) 
Jul 27 16:12:55 tank-01 kernel: Stack: f8a84810 00000148 f8a8595c 
f8a85b1c 00964e85 f7feb338 00000003 00000000 
Jul 27 16:12:55 tank-01 kernel:        f531ce40 f7feb338 f5805ef4 
00000000 f8a77c48 f5804000 00000001 00000000 
Jul 27 16:12:55 tank-01 kernel:        00000078 f583eae0 00000078 
f583e9cc 00000000 00000000 f078aa24 00000003 
Jul 27 16:12:55 tank-01 kernel: Call Trace: 
Jul 27 16:12:55 tank-01 kernel:  [<f8a77c48>] 
process_cluster_request+0x6e8/0xd30 [dlm] 
Jul 27 16:12:55 tank-01 kernel:  [<c02b0e98>] inet_recvmsg+0x48/0x70 
Jul 27 16:12:55 tank-01 kernel:  [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
Jul 27 16:12:55 tank-01 kernel:  [<f8a7b8d3>] 
midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
Jul 27 16:12:55 tank-01 kernel:  [<c011e809>] __do_softirq+0x79/0x80 
Jul 27 16:12:55 tank-01 kernel:  [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
Jul 27 16:12:55 tank-01 kernel:  [<f8a795f1>] 
receive_from_sock+0x141/0x300 [dlm] 
Jul 27 16:12:55 tank-01 kernel:  [<c011e809>] __do_softirq+0x79/0x80 
Jul 27 16:12:55 tank-01 kernel:  [<c0117e67>] 
recalc_task_prio+0x97/0x190 
Jul 27 16:12:55 tank-01 kernel:  [<f8a7a49b>] 
process_sockets+0x7b/0xa0 [dlm] 
Jul 27 16:12:55 tank-01 kernel:  [<f8a7a70e>] dlm_recvd+0x9e/0xf0 
[dlm] 
Jul 27 16:12:55 tank-01 kernel:  [<f8a7a670>] dlm_recvd+0x0/0xf0 
[dlm] 
Jul 27 16:12:55 tank-01 kernel:  [<c010429d>] 
kernel_thread_helper+0x5/0x18 
Jul 27 16:12:55 tank-01 kernel: 
Jul 27 16:12:55 tank-01 kernel: Code: 0f 0b 48 01 5c 59 a8 f8 e9 95 
fb ff ff e8 f9 f1 ff ff e8 c4 
 
 
Lock_Harness <CVS> (built Jul 27 2004 11:42:58) installed 
GFS <CVS> (built Jul 27 2004 11:43:25) installed 
CMAN <CVS> (built Jul 27 2004 11:42:37) installed 
DLM <CVS> (built Jul 27 2004 11:42:53) installed 
Lock_DLM (built Jul 27 2004 11:43:01) installed 
 
 
 
Version-Release number of selected component (if applicable): 
 
 
How reproducible: 
Didn't try 
 
Steps to Reproduce: 
1. 
2. 
3. 
     
 
Additional info:
Comment 1 Dean Jansa 2004-07-28 10:59:50 EDT
This is reproducible: 
 
Tank-03: 
 
------------[ cut here ]------------ 
kernel BUG at /usr/src/cluster/dlm-kernel/src/lockqueue.c:328! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8a76b75>]    Not tainted 
EFLAGS: 00010286   (2.6.7) 
EIP is at process_lockqueue_reply+0x565/0x690 [dlm] 
eax: 00000001   ebx: 00000001   ecx: 00000000   edx: f5b65e0c 
esi: f548e1e0   edi: f5b65ef4   ebp: f5492d9c   esp: f5b65e08 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3593, threadinfo=f5b64000 task=f5b693b0) 
Stack: f8a84810 00000148 f8a8595c f8a85b1c 00054eaf f7d62738 
00000002 00000000 
       f548e1e0 f7d62738 f5b65ef4 00000000 f8a77c48 f5b64000 
00000001 00000000 
       0000003c f5b4cee0 0000003c f5b4cdcc 00000000 00000000 
f54924f0 00000002 
Call Trace: 
 [<f8a77c48>] process_cluster_request+0x6e8/0xd30 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a7b8d3>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c0117fba>] activate_task+0x5a/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a795f1>] receive_from_sock+0x141/0x300 [dlm] 
 [<f8a7a49b>] process_sockets+0x7b/0xa0 [dlm] 
 [<f8a7a70e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<f8a7a670>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 48 01 5c 59 a8 f8 e9 95 fb ff ff e8 f9 f1 ff ff e8 c4 
 
 
tank-06: 
------------[ cut here ]------------ 
kernel BUG at /usr/src/cluster/dlm-kernel/src/lockqueue.c:328! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<e02fcb75>]    Not tainted 
EFLAGS: 00010286   (2.6.7) 
EIP is at process_lockqueue_reply+0x565/0x690 [dlm] 
eax: 00000001   ebx: 00000001   ecx: 00000000   edx: dafd1e0c 
esi: d898fa20   edi: dafd1ef4   ebp: d8993a24   esp: dafd1e08 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3606, threadinfo=dafd0000 task=dab21930) 
Stack: e030a810 00000148 e030b95c e030bb1c 0005be75 dae71638 
00000002 00000000 
       d898fa20 dae71638 dafd1ef4 00000000 e02fdc48 dafd0000 
00000001 00000000 
       0000003c da9f26e0 0000003c da9f25cc 00000000 00000000 
d8993ab8 00000002 
Call Trace: 
 [<e02fdc48>] process_cluster_request+0x6e8/0xd30 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<e03018d3>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<e02ff5f1>] receive_from_sock+0x141/0x300 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<e030049b>] process_sockets+0x7b/0xa0 [dlm] 
 [<e030070e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<e0300670>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 48 01 5c b9 30 e0 e9 95 fb ff ff e8 f9 f1 ff ff e8 c4 
 
Comment 2 Dean Jansa 2004-07-28 14:02:17 EDT
FWIW -  You can trim down the syscall list given to d_iogen to just 
"read" and you will still hit this assertion. 
 
./d_iogen -P boron -N gfs -R  
../var/share/resource_files/tank-cluster.xml -m random -s read -t 1b 
-T 100b -F 1000b:f1,1000b:f2,1000b:f3  
Comment 3 Dean Jansa 2004-07-28 15:31:33 EDT
One more tidbit - 
 
If I run single process IO from multiple nodes to a single file I 
don't hit this (or haven't hit it yet).  But, as soon as one of the 
nodes has multiple processes on that node locking the file (along 
with all the other nodes) you trip over the assert. 
 
For example, I can trip this assert with accordion (a little eaiser 
to setup than d_iogen/d_doio).  Like this: 
 
1:  Start on all the nodes (A single process): 
   accordion  -L fcntl -s 1024000 -e 4097 -t -m 100 acc1 acc2 acc3 
 acc4 
 
2: OK, the above is running along, now start (a multiprocess) on any 
node (does not need to be on all): 
 
accordion  -p 10 -L fcntl -s 1024000 -e 4097 -t -m 100 acc1 acc2 
acc3 acc4 
 
You will trip the assert. 
 
And it looks like only fcntl does this, flock seems OK (but I only 
tried a simple case) 
 
Comment 4 David Teigland 2004-07-29 05:12:11 EDT
this is a situation I've known about and had down on my list for a
while to work on. gfs's locking behavior avoids this problem but
plocks are handled in lock_dlm which doesn't follow the same rules,
making this
issue immediately relevant again.  I'm working on a fix for this which
will also solve another related problem that doesn't seem to have
arisen yet.

The crux of the problem is many locks being requested at once on a
single resource.  The initial lookup step is not properly serialized
when all the requests run into the first stage together.
Comment 5 David Teigland 2004-08-03 05:11:43 EDT
This accordion test runs fine for me with today's checkin.
Comment 6 Corey Marthaler 2004-08-04 10:52:27 EDT
*** Bug 129151 has been marked as a duplicate of this bug. ***
Comment 7 Dean Jansa 2004-08-04 15:11:24 EDT
I was able to hit this again today after a fresh checkout: 
 
GFS <CVS> (built Aug  4 2004 10:53:42) installed 
CMAN <CVS> (built Aug  4 2004 10:52:55) installed 
DLM <CVS> (built Aug  4 2004 10:53:10) installed 
Lock_DLM (built Aug  4 2004 10:53:20) installed 
Lock_Nolock <CVS> (built Aug  4 2004 10:53:18) installed 
Gulm v6.0.0 (built Aug  4 2004 10:53:28) installed 
 
accordion alone didn't trip it (or I didn't wait long enough for it 
to hit).  I was able to hit it by running: 
 
1) iogen -f buffered -m sequential -s read,write,readv,writev -t 1b 
-T 4b 4b:/mnt/gfs0/rwbufsmall | doio -n 2 -avk & 
2) accordion -p 10 -L fcntl -s 1024000 -e 4097 -t -m 100 acc1 acc2 & 
3 iogen -S 4728 -o -m random -s read,write,readv,writev -t 1b 
-T1000b 10000b:tfile1 10000b:tfile2 10000b:tfile3 | doio -avk -m 
1000 & 
4)  d_doio -P boron -N gfs -a -k -m 100 -n 10 -r 100 & 
  (With ./d_iogen -P boron -N gfs -R  
../var/share/resource_files/tank-cluster.xml -m random -t 1b -T 100b 
-F 1000b:f1,1000b:f2,1000b:f3 running on boron) 
 
(I.E. run the test cases from Corey's duped bug and these together.) 
--------------- 
 
DLM:  Assertion failed on line 342 of file 
/usr/src/cluster/dlm-kernel/src/lockqueue.c 
DLM:  assertion:  "rsb->res_nodeid == 0" 
DLM:  time = 368073 
dlm: lkb 
id 100054 
remid 0 
flags 0 
status 0 
rqmode 3 
grmode -1 
nodeid 4294967295 
lqstate 0 
lqflags 0 
dlm: rsb 
name "       7         817e229" 
nodeid 1 
flags 0 
ref 4 
dlm: reply 
rh_cmd 5 
rh_lkid 100054 
lockstate 0 
nodeid 3 
status 0 
lkid f779de88 
 
 
------------[ cut here ]------------ 
kernel BUG at /usr/src/cluster/dlm-kernel/src/lockqueue.c:342! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac 
ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8a76d70>]    Not tainted 
EFLAGS: 00010286   (2.6.7) 
EIP is at process_lockqueue_reply+0x610/0x730 [dlm] 
eax: 00000001   ebx: 00000001   ecx: 00000000   edx: f7751e0c 
esi: f51a1cb4   edi: f509bbe0   ebp: f7751ef4   esp: f7751e08 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3694, threadinfo=f7750000 task=f77ace30) 
Stack: f8a84a34 00000156 f8a85bc8 f8a84ae7 00059dc9 f77a7538 
00000006 00000078 
       f51a1cb4 f77a7538 f7751ef4 00000078 f8a77dc5 f7750000 
00000001 00000000 
       000000b4 f74feee0 000000b4 f74fedcc 00000000 00000000 
f509bec4 00000006 
Call Trace: 
 [<f8a77dc5>] process_cluster_request+0x6e5/0xd30 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a7ba53>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c011e809>] __do_softirq+0x79/0x80 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a79771>] receive_from_sock+0x141/0x300 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8a7a61b>] process_sockets+0x7b/0xa0 [dlm] 
 [<f8a7a88e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<f8a7a7f0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 56 01 c8 5b a8 f8 e9 ea fa ff ff 8d 76 00 0f ba 6f 24 
 
 
Comment 8 Corey Marthaler 2004-08-04 15:15:52 EDT
I also reproduced above senario on my cluster after fresh build 
Comment 9 Dean Jansa 2004-08-04 17:33:30 EDT
I just hit this running only: 
(on boron) d_iogen -P boron -N gfs -R 
../var/share/resource_files/tank-cluster.xml -m random -s 
read,write,readv,writev -t 1 -T 1000b -F 
10000b:tfile1,10000b:tfile2,1000b:file3 
 
(on all 6 tank nodes) d_doio -P boron -N gfs -a -k -m 1000 -n 10 -r 
1000 
 
So I guess you needn't run all of the above to hit it after all. 
 
DLM:  Assertion failed on line 342 of file 
/usr/src/cluster/dlm-kernel/src/lockc 
DLM:  assertion:  "rsb->res_nodeid == 0" 
DLM:  time = 725820 
dlm: lkb 
id c0018 
remid 0 
flags 0 
status 0 
rqmode 3 
grmode -1 
nodeid 4294967295 
lqstate 0 
lqflags 0 
dlm: rsb 
name "       7              7d" 
nodeid 1 
flags 0 
ref 2 
dlm: reply 
rh_cmd 5 
rh_lkid c0018 
lockstate 0 
nodeid 5 
status 0 
lkid f75b3e88 
 
------------[ cut here ]------------ 
kernel BUG at /usr/src/cluster/dlm-kernel/src/lockqueue.c:342! 
invalid operand: 0000 [#1] 
Modules linked in: lock_dlm dlm cman gfs lock_harness ipv6 
parport_pc lp parpord 
CPU:    0 
EIP:    0060:[<e02fcd70>]    Not tainted 
EFLAGS: 00010286   (2.6.7) 
EIP is at process_lockqueue_reply+0x610/0x730 [dlm] 
eax: 00000001   ebx: 00000001   ecx: 00000000   edx: dabcfe0c 
esi: d830b918   edi: d8fea8fc   ebp: dabcfef4   esp: dabcfe08 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 2267, threadinfo=dabce000 task=db633330) 
Stack: e030aa34 00000156 e030bbc8 e030aae7 000b133c df781a38 
00000006 00000000 
       d830b918 df781a38 dabcfef4 00000000 e02fddc5 dabce000 
00000001 00000000 
       0000003c db0d6ae0 0000003c db0d69cc 00000000 00000000 
d8fead08 00000006 
Call Trace: 
 [<e02fddc5>] process_cluster_request+0x6e5/0xd30 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<e0301a53>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<e02ff771>] receive_from_sock+0x141/0x300 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<e030061b>] process_sockets+0x7b/0xa0 [dlm] 
 [<e030088e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<e03007f0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 56 01 c8 bb 30 e0 e9 ea fa ff ff 8d 76 00 0f ba 6f 24 
 
Comment 10 David Teigland 2004-08-04 23:48:38 EDT
Yes, the "partial fix" I checked in only fixed this for the accordion
test.  I believe I now have a solution to all these, but I want to
verify a bit more before checking in.  There are a couple things going
on here that are very closely related but require different
solutions.

In fact, with the current code in cvs, a simple writeread test should
fail -- these problems are also related to the data compare errors that
plagued us with accordion and writeread a couple weeks ago.
Comment 11 David Teigland 2004-08-06 05:03:33 EDT
I expect these tests should work ok (or get much farther) after the
changes I made yesterday.  These tests make heavy use of lots of 
plocks; the dlm wasn't prepared to deal with that much concurrent
locking on one resource (it's still not terribly robust which we'll
be working on.)
Comment 12 Dean Jansa 2004-08-06 14:28:24 EDT
The tests ran for ~2 hours until I hit: 
 
 DLM <CVS> (built Aug  6 2004 10:01:20) 
 
(on boron) d_iogen -P boron -N gfs -R  
../var/share/resource_files/tank-cluster.xml -m random -s  
read,write,readv,writev -t 1 -T 1000b -F  
10000b:tfile1,10000b:tfile2,1000b:file3  
  
(on all 6 tank nodes) d_doio -P boron -N gfs -a -k -m 1000 -n 10 -r  
1000  
 
(The second oops is probably just fallout from the first, but I'll 
add it anyway.) 
 
Unable to handle kernel NULL pointer dereference at virtual address 
00000004 
 printing eip: 
f8ba8f22 
*pde = 00000000 
Oops: 0002 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8ba8f22>]    Not tainted 
EFLAGS: 00010283   (2.6.7) 
EIP is at dlm_async+0x222/0x2e0 [lock_dlm] 
eax: 00000000   ebx: 00000000   ecx: f5278fa8   edx: 00000000 
esi: f7fc3338   edi: f5ca8000   ebp: f5278f58   esp: f5ca9f84 
ds: 007b   es: 007b   ss: 0068 
Process lock_dlm (pid: 3973, threadinfo=f5ca8000 task=f5cb4eb0) 
Stack: f8ba9d31 f7fc3368 f7fc3390 00001000 00000000 f5c7b7c8 
f5cb4eb0 00000000 
       f5cb4eb0 c0118850 00000000 00000000 f5cb4eb0 f65d8280 
f5ca8000 00000000 
       f5cb4eb0 c0118850 00100100 00200200 00000000 00000000 
00000000 f8ba8d00 
Call Trace: 
 [<c0118850>] default_wake_function+0x0/0x10 
 [<c0118850>] default_wake_function+0x0/0x10 
 [<f8ba8d00>] dlm_async+0x0/0x2e0 [lock_dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 89 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 0f ba 
 <1>Unable to handle kernel NULL pointer dereference at virtual 
address 00000004 
 printing eip: 
f8ba8f22 
*pde = 00000000 
Oops: 0002 [#2] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8ba8f22>]    Not tainted 
EFLAGS: 00010283   (2.6.7) 
EIP is at dlm_async+0x222/0x2e0 [lock_dlm] 
eax: 00000000   ebx: 00000000   ecx: f5278fa8   edx: 00000000 
esi: f7fc3338   edi: f5caa000   ebp: f5278f58   esp: f5cabf84 
ds: 007b   es: 007b   ss: 0068 
Process lock_dlm (pid: 3972, threadinfo=f5caa000 task=f5cb4930) 
Stack: f8ba9d31 f7fc3368 f7fc3390 0012451c 00000000 f5da5d68 
00000000 00000000 
       f5cb4930 c0118850 00000000 00000000 f5cb4930 f5c95d40 
c0105c12 00000000 
       f5cb4930 c0118850 00100100 00200200 00000000 00000000 
00000000 f8ba8d00 
Call Trace: 
 [<c0118850>] default_wake_function+0x0/0x10 
 [<c0105c12>] ret_from_fork+0x6/0x14 
 [<c0118850>] default_wake_function+0x0/0x10 
 [<f8ba8d00>] dlm_async+0x0/0x2e0 [lock_dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 89 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 0f ba 
 
Comment 13 David Teigland 2004-08-07 00:11:12 EDT
OK, I get the same thing.  This is a new bug I believe.
Comment 14 David Teigland 2004-08-11 00:20:29 EDT
Fixed in cvs.  I've run d_iogen/d_doio on 7 nodes for many hours.
Comment 15 Dean Jansa 2004-08-19 16:34:38 EDT
Ran the d_iogen/d_doio with the new code for 5 hours, no issues. 
 
Comment 16 Kiersten (Kerri) Anderson 2004-11-16 14:07:30 EST
Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.