Bug 126537 - File corruption with IO from multiple nodes
Summary: File corruption with IO from multiple nodes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 3
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Ken Preslan
QA Contact: Derek Anderson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-06-22 22:01 UTC by Dean Jansa
Modified: 2010-01-12 02:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-07-22 14:19:38 UTC
Embargoed:


Attachments (Terms of Use)

Description Dean Jansa 2004-06-22 22:01:11 UTC
Description of problem: 
While running accordion I hit file corruption, using both flocks and 
fcntl locks: 
 
With fcntl locks: 
 
cmd run on all 6 nodes in my cluster (in the gfs fs): 
 
accordion -s 409600 -e 4096 -m 100 accfile1 accfile2 acccfile3 
 
accordion starting: 
Iterations:      0 
Run time:        0s 
Lock type:       fcntl 
File size:       409600 
Extend size:     4096 
Random truncate: No 
Use lseek:       No 
Random seed:     4270 
Filelist: 
---------------------------------------------------- 
/mnt/gfs0/accfile1 
/mnt/gfs0/accfile2 
/mnt/gfs0/acccfile3 
accordion (4270) completed 100 operations - 5.79 ops/sec 
accordion (4270) completed 100 operations - 5.22 ops/sec 
accordion (4270) completed 100 operations - 5.87 ops/sec 
accordion (4270) completed 100 operations - 6.86 ops/sec 
*** DATA COMPARISON ERROR accfile1 *** 
Corrupt regions follow - unprintable chars are represented as '.' 
----------------------------------------------------------------- 
corrupt bytes starting at file offset 4 
    1st 32 expected bytes:  70:tank-06:accordion*W:4270:tank 
    1st 32 actual bytes:    50:tank-04:accordion*W:4250:tank 
 
 
With flocks: 
accordion -L flock -s 409600 -e 4096 -m 100 accfile1 accfile2 
acccfile3 
 
accordion starting: 
Iterations:      0 
Run time:        0s 
Lock type:       flock 
File size:       409600 
Extend size:     4096 
Random truncate: No 
Use lseek:       No 
Random seed:     4327 
Filelist: 
---------------------------------------------------- 
/mnt/gfs0/accfile1 
/mnt/gfs0/accfile2 
/mnt/gfs0/acccfile3 
 
accordion (4327) completed 100 operations - 20.62 ops/sec 
accordion (4327) completed 100 operations - 16.69 ops/sec 
accordion (4327) completed 100 operations - 17.01 ops/sec 
accordion (4327) completed 100 operations - 17.24 ops/sec 
accordion (4327) completed 100 operations - 15.50 ops/sec 
 
*** DATA COMPARISON ERROR accfile1 *** 
Corrupt regions follow - unprintable chars are represented as '.' 
----------------------------------------------------------------- 
corrupt bytes starting at file offset 3 
    1st 32 expected bytes:  327:tank-01:accordion*W:4327:tan 
    1st 32 actual bytes:    246:tank-04:accordion*W:4246:tan 
 
child (4327) exited with status 1 
 
 
Version-Release number of selected component (if applicable): 
GFS <CVS> (built Jun 17 2004 10:53:57) installed  
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1. Use cmdlines above, pull seed and add it with a -S flag if you 
wish to have the same random seed.  Not needed, you hit it without 
it as well. 
2. 
3. 
     
 
Expected Results:  accordion -- 
Open a file, lock it, optionally tunc the file, write a chunk of 
data to end of file, write check, unlock, close. 
Optionally use lseek to extend the file writing only a single byte 
at the end, check write, unlock and close. 
The file will never grow larger than the requested size.  If the 
extend of the file was to make it grow larger than requested size, 
trunc the file and start over. 
 
 
 
Additional info:

Comment 1 Ken Preslan 2004-06-23 14:14:09 UTC
Ok, this is 2.6, I presume?  Exactly which kernel?





Comment 2 Dean Jansa 2004-06-23 14:34:24 UTC
2.6.7 
 
GFS <CVS> (built Jun 17 2004 10:53:57) installed 
CMAN V2.0.1 (built Jun 17 2004 11:14:22) installed 
DLM (built Jun 17 2004 11:14:35) installed 
Lock_DLM (built Jun 17 2004 10:54:06) installed 
Lock_Nolock <CVS> (built Jun 17 2004 10:54:17) installed 
Gulm v6.0.0 (built Jun 17 2004 10:54:14) installed 
 

Comment 3 Dean Jansa 2004-06-25 21:33:24 UTC
Ken, 
 
FWIW I can hit this with iogen/doio as well.  I'm working on 
narrowing down the needed syscalls, I'll update if I can get it 
to trip on anything less than the below list.  
 
(I should note, I have reproduced it with a single file on the 
iogen line as well) 
 
[root@tank-06 gfs0]# iogen -o -m random -s read,write,readv,writev 
-t 1b -T1000b 10000b:tfile1 10000b:tfile2 10000b:tfile3 | doio -avk 
 
iogen starting up with the following: 
 
Out-pipe:              stdout 
Iterations:            Infinite 
Seed:                  4728 
Offset-Mode:           random 
Overlap Flag:          on 
Mintrans:              512         (1 blocks) 
Maxtrans:              512000      (1000 blocks) 
O_RAW/O_SSD Multiple:  (Determined by device) 
Syscalls:              read write readv writev 
Aio completion types:  none 
Flags:                 buffered sync 
 
Test Files: 
 
Path                                          Length    iou   raw 
iou file 
                                              (bytes) (bytes) 
(bytes) type 
----------------------------------------------------------------------------- 
/mnt/gfs0/tfile1                              5120000       1     
512 regular 
/mnt/gfs0/tfile2                              5120000       1     
512 regular 
/mnt/gfs0/tfile3                              5120000       1     
512 regular 
 
doio ( 4729) 10:55:39 
--------------------- 
*** DATA COMPARISON ERROR *** 
check_file(/mnt/gfs0/tfile2, 4825886, 100421, M:4729:tank-06:doio*, 
20, 0) failed 
 
Comparison fd is 3, with open flags 0 
Corrupt regions follow - unprintable chars are represented as '.' 
----------------------------------------------------------------- 
corrupt bytes starting at file offset 4825886 
    1st 32 expected bytes:  M:4729:tank-06:doio*M:4729:tank- 
    1st 32 actual bytes:    :doio*K:4710:tank-05:doio*K:4710 
 
Request number 3908 
          fd 12 is file /mnt/gfs0/tfile2 - open flags are 010001 
O_WRONLY,O_SYNC, 
          write done at file offset 4825886 - pattern is M (0115) 
          number of requests is 1, strides per request is 1 
          i/o byte count = 100421 
          memory alignment is unaligned 
 
syscall:  writev(12, (iov on stack), 1) 
 
 

Comment 4 David Teigland 2004-06-28 10:04:35 UTC
I wrote a simpler test which reproduced the same effect (using flock
running on 4 nodes).
It's on homer ~teigland/writeread.c  In short it's:

for (;;) {
  sprintf(wbuf, "%s.%u", hostname, i);
  while (lock_file(fd, lock_type) < 0) ;
  lseek(fd, 0, SEEK_SET);
  write(fd, wbuf, len);
  lseek(fd, 0, SEEK_SET);
  read(fd, rbuf, len);
  if (memcmp(wbuf, rbuf, len))
    die("memcmp error\n write: %s\n read %s\n", wbuf, rbuf);
  unlock_file(fd, lock_type);
}

The problem with this test is that it often causes one of two
different panics before running long enough to see a memcmp error.
The first panic is in the dlm (process_asts) and the second is in
gfs (1879 of glock.c).  Who knows how related any of these might be.

This is the first reliable way I've found to trigger the dlm ast bug
so I'll work on that first.  Maybe that'll fix the gfs assertion and
just maybe the write/read mismatch.



Comment 5 David Teigland 2004-06-28 12:21:32 UTC
I just realized I've been doing all this testing without the flock
patch applied to my kernel (and I've been using flocks not plocks).


Comment 6 David Teigland 2004-06-28 13:01:34 UTC
The last comment was wrong.  The flock patch is 00001 so I am
using it.

Comment 7 Christine Caulfield 2004-07-15 07:24:21 UTC
After much head scratching and work, Dave and I think we've fixed this
properly now.

Comment 8 Dean Jansa 2004-07-19 22:04:20 UTC
I attemped to check this fix out, and hit the following while 
running: 
 
accordion -s 409600 -e 4096 -m 100 accfile1 accfile2 acccfile3 
 
tank-01:  
ul 19 16:47:06 tank-01 kernel: ------------[ cut here ]------------ 
Jul 19 16:47:06 tank-01 kernel: kernel BUG at 
cluster/dlm/locking.c:584! 
Jul 19 16:47:06 tank-01 kernel: invalid operand: 0000 [#1] 
Jul 19 16:47:06 tank-01 kernel: Modules linked in: gnbd lock_gulm 
lock_nolock lo 
ck_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 
sunrpc e1000 
 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery 
asus_acpi ac ext3 j 
bd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
Jul 19 16:47:06 tank-01 kernel: CPU:    0 
Jul 19 16:47:06 tank-01 kernel: EIP:    0060:[<f8a72b9e>]    Not 
tainted 
Jul 19 16:47:06 tank-01 kernel: EFLAGS: 00010282   (2.6.7) 
Jul 19 16:47:06 tank-01 kernel: EIP is at remote_stage2+0x21e/0x240 
[dlm] 
Jul 19 16:47:06 tank-01 kernel: eax: 00000001   ebx: f2566aa4   ecx: 
00000000 
edx: f7675e04 
Jul 19 16:47:06 tank-01 kernel: esi: 00000001   edi: f599b000   ebp: 
c335d238 
esp: f7675e00 
Jul 19 16:47:06 tank-01 kernel: ds: 007b   es: 007b   ss: 0068 
Jul 19 16:47:06 tank-01 kernel: Process dlm_recvd (pid: 3788, 
threadinfo=f767400 
0 task=f76790b0) 
Jul 19 16:47:06 tank-01 kernel: Stack: f8a81f17 00000006 f8a81f01 
f8a81f41 00441 
29b f241cf39 00000000 00000006 
Jul 19 16:47:06 tank-01 kernel:        f241cec4 00000000 c335d238 
f599b000 00000 
006 f8a752e0 f76b1c44 f7674000 
Jul 19 16:47:06 tank-01 kernel:        00000001 00000000 00000067 
f76b1ee0 00000 
067 f76b1dcc f7675f94 00000000 
Jul 19 16:47:06 tank-01 kernel: Call Trace: 
Jul 19 16:47:06 tank-01 kernel:  [<f8a752e0>] 
process_cluster_request+0x160/0xd4 
0 [dlm] 
Jul 19 16:47:06 tank-01 kernel:  [<c02b0e98>] inet_recvmsg+0x48/0x70 
Jul 19 16:47:06 tank-01 kernel:  [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
Jul 19 16:47:06 tank-01 kernel:  [<f8a79313>] 
midcomms_process_incoming_buffer+0 
x173/0x250 [dlm] 
Jul 19 16:47:06 tank-01 kernel:  [<c0136af3>] 
__alloc_pages+0x2f3/0x340 
Jul 19 16:47:06 tank-01 kernel:  [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
Jul 19 16:47:06 tank-01 kernel:  [<f8a77011>] 
receive_from_sock+0x141/0x310 [dlm 
] 
Jul 19 16:47:06 tank-01 kernel:  [<c0117e67>] 
recalc_task_prio+0x97/0x190 
Jul 19 16:47:06 tank-01 kernel:  [<f8a77ec7>] 
process_sockets+0x57/0x80 [dlm] 
Jul 19 16:47:06 tank-01 kernel:  [<f8a7813e>] dlm_recvd+0x9e/0xf0 
[dlm] 
Jul 19 16:47:06 tank-01 kernel:  [<f8a780a0>] dlm_recvd+0x0/0xf0 
[dlm] 
Jul 19 16:47:06 tank-01 kernel:  [<c010429d>] 
kernel_thread_helper+0x5/0x18 
Jul 19 16:47:06 tank-01 kernel: 
Jul 19 16:47:06 tank-01 kernel: Code: 0f 0b 48 02 01 1f a8 f8 c7 04 
24 c4 2f a8 
f8 e8 3e 85 6a c7 
Jul 19 16:47:06 tank-01 kernel:  <6>CMAN: bad generation number 14 
in HELLO mess 
age, expected 10 
Jul 19 16:47:06 tank-01 kernel: CMAN: bad generation number 15 in 
HELLO message, 
 expected 10 
 
tank-02: 
CMAN: quorum lost, blocking activity 
dlm: got connection from 4 
 
tank-03: 
------------[ cut here ]------------ 
kernel BUG at cluster/dlm/locking.c:584! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac 
ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8a72b9e>]    Not tainted 
EFLAGS: 00010282   (2.6.7) 
EIP is at remote_stage2+0x21e/0x240 [dlm] 
eax: 00000001   ebx: f57f357c   ecx: 00000000   edx: f6b25e04 
esi: 00000001   edi: f7643000   ebp: f6dfcd38   esp: f6b25e00 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3712, threadinfo=f6b24000 task=f6b29430) 
Stack: f8a81f17 00000002 f8a81f01 f8a81f41 004b1232 f57f7bc1 
00000000 00000002 
       f57f7b4c 00000000 f6dfcd38 f7643000 00000002 f8a752e0 
f6b50044 f6b24000 
       00000001 00000000 000000b7 f6b502e0 000000b7 f6b501cc 
f6b25f94 00000000 
Call Trace: 
 [<f8a752e0>] process_cluster_request+0x160/0xd40 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a79313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c0136af3>] __alloc_pages+0x2f3/0x340 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a77011>] receive_from_sock+0x141/0x310 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8a77ec7>] process_sockets+0x57/0x80 [dlm] 
 [<f8a7813e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<f8a780a0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 48 02 01 1f a8 f8 c7 04 24 c4 2f a8 f8 e8 3e 85 6a c7 
 <4>CMAN: no HELLO from tank-02.lab.msp.redhat.com, removing from 
the cluster 
 
 
tank-04: 
------------[ cut here ]------------ 
kernel BUG at kernel/timer.c:405! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac 
ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<c0121b10>]    Not tainted 
EFLAGS: 00010006   (2.6.7) 
EIP is at cascade+0x40/0x50 
eax: f5a65e10   ebx: c03b5a28   ecx: c03b5a28   edx: c03b5a28 
esi: c03b59f8   edi: c03b5180   ebp: 0000000e   esp: c0367f40 
ds: 007b   es: 007b   ss: 0068 
Process swapper (pid: 0, threadinfo=c0366000 task=c0312a40) 
Stack: 00000000 c03b4ea8 c0367f54 c0367f54 c01220d1 c0367f54 
c0367f54 c0122217 
       00000001 c03b4ea8 0000000a c0314e24 c011e809 00000046 
c0364a00 00000000 
       c011e837 00000000 c01077c5 00000000 c0367fac c0314e24 
c0366000 00099100 
Call Trace: 
 [<c01220d1>] run_timer_softirq+0xe1/0x150 
 [<c0122217>] do_timer+0xc7/0xd0 
 [<c011e809>] __do_softirq+0x79/0x80 
 [<c011e837>] do_softirq+0x27/0x30 
 [<c01077c5>] do_IRQ+0xd5/0x110 
 [<c0105e6c>] common_interrupt+0x18/0x20 
 [<c0104053>] default_idle+0x23/0x40 
 [<c01040e4>] cpu_idle+0x34/0x40 
 [<c03685e2>] start_kernel+0x162/0x1a0 
 [<c0368330>] unknown_bootoption+0x0/0x120 
 
Code: 0f 0b 95 01 ea e4 2d c0 eb dd 8d b6 00 00 00 00 56 53 83 ec 
 <0>Kernel panic: Fatal exception in interrupt 
In interrupt handler - not syncing 
 
 
tank-05: 
CMAN: node tank-03.lab.msp.redhat.com is not responding - removing 
from the cluster 
CMAN: quorum lost, blocking activity 
 
 
tank-06: 
CMAN: quorum lost, blocking activity 
 

Comment 9 Dean Jansa 2004-07-19 22:26:31 UTC
Re-ran the same test to see if the above was reproducable: 
 
tank-02: 
 
------------[ cut here ]------------ 
kernel BUG at cluster/dlm/locking.c:584! 
invalid operand: 0000 [#1] 
Modules linked in: gfs lock_dlm dlm cman lock_harness ipv6 
parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode 
dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8c37b9e>]    Not tainted 
EFLAGS: 00010282   (2.6.7) 
EIP is at remote_stage2+0x21e/0x240 [dlm] 
eax: 00000001   ebx: f4c8ee40   ecx: 00000000   edx: f7713e04 
esi: 00000004   edi: f7589000   ebp: f7cd2a38   esp: f7713e00 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 2317, threadinfo=f7712000 task=f77b57b0) 
Stack: f8c46f17 00000006 f8c46f01 f8c46f41 000527f5 f4cae315 
00000000 00000006 
       f4cae2a0 00000000 f7cd2a38 f7589000 00000006 f8c3a2e0 
f755bc44 f7712000 
       00000001 00000000 000000b7 f755bee0 000000b7 f755bdcc 
f7713f94 00000000 
Call Trace: 
 [<f8c3a2e0>] process_cluster_request+0x160/0xd40 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8c3e313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c0136af3>] __alloc_pages+0x2f3/0x340 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8c3c011>] receive_from_sock+0x141/0x310 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8c3cec7>] process_sockets+0x57/0x80 [dlm] 
 [<f8c3d13e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<f8c3d0a0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 48 02 01 6f c4 f8 c7 04 24 c4 7f c4 f8 e8 3e 35 4e c7 
 

Comment 10 Dean Jansa 2004-07-20 14:57:32 UTC
FWIW - 
 
I re-ran using flocks: 
 
 accordion -L flock -s 409600 -e 4096 -m 100 accfile1 accfile2 
accfile3 
 
tank-04: 
name "       3         d7d457c" flags 0 nodeid 6 ref 1 
grant queue 
000403e8 gr 0 rq -1 flg 8 sts 2 node 6 remid 301a8 lq 0,c 
name "       4               0" flags 0 nodeid 1 ref 1 
grant queue 
00010184 gr 3 rq -1 flg 0 sts 2 node 1 remid 102c5 lq 0,0 
name "       3         5706894" flags 4 nodeid 0 ref 2 
grant queue 
00020041 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
0004006e gr 0 rq -1 flg 2008 sts 2 node 1 remid 9027e lq 0,c 
name "       8             3e8" flags 4 nodeid 0 ref 6 
grant queue 
000403ac gr 3 rq -1 flg 2008 sts 2 node 4 remid 5006c lq 0,1c 
00030117 gr 3 rq -1 flg 2008 sts 2 node 1 remid 202df lq 0,1c 
000203be gr 3 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
0001032b gr 3 rq -1 flg 2008 sts 2 node 2 remid 1001a lq 0,1c 
000102b2 gr 3 rq -1 flg 2008 sts 2 node 6 remid 10319 lq 0,1c 
000600f0 gr 3 rq -1 flg 2008 sts 2 node 5 remid 40113 lq 0,1c 
name "       3         ad99210" flags 0 nodeid 5 ref 1 
grant queue 
0005026f gr 5 rq -1 flg 8 sts 2 node 5 remid 50247 lq 0,8 
name "       3         d7a45d3" flags 0 nodeid 6 ref 1 
grant queue 
000200df gr 0 rq -1 flg 8 sts 2 node 6 remid 30122 lq 0,c 
name "       3         acc9389" flags 0 nodeid 5 ref 1 
grant queue 
00020058 gr 0 rq -1 flg 8 sts 2 node 5 remid 302ab lq 0,c 
name "       3         572685a" flags 4 nodeid 0 ref 1 
grant queue 
0006021a gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,9 
name "       1               1" flags 0 nodeid 1 ref 1 
grant queue 
0001022f gr 3 rq -1 flg 0 sts 2 node 1 remid 1028f lq 0,0 
name "      10               2" flags 4 nodeid 0 ref 1 
grant queue 
00010034 gr 3 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,4c 
name "       3         2b2b81a" flags 0 nodeid 2 ref 1 
grant queue 
00010294 gr 0 rq -1 flg 8 sts 2 node 2 remid 200e8 lq 0,c 
name "       3         acb93a6" flags 0 nodeid 5 ref 1 
grant queue 
00030146 gr 5 rq -1 flg 8 sts 2 node 5 remid 202d0 lq 0,8 
name "       3         d7945f0" flags 0 nodeid 6 ref 1 
grant queue 
00020265 gr 5 rq -1 flg 8 sts 2 node 6 remid 30342 lq 0,8 
name "       2              16" flags 0 nodeid 1 ref 1 
grant queue 
00010094 gr 3 rq -1 flg 0 sts 2 node 1 remid 10359 lq 0,0 
name "       3         5676999" flags 4 nodeid 0 ref 3 
grant queue 
00020199 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
0005022a gr 5 rq -1 flg 2008 sts 2 node 6 remid 3030e lq 0,8 
000401f7 gr 0 rq -1 flg 2008 sts 2 node 1 remid 2004a lq 0,c 
name "       5              1a" flags 0 nodeid 1 ref 1 
grant queue 
0001039f gr 3 rq -1 flg 0 sts 2 node 1 remid 102a9 lq 0,0 
name "       3           80579" flags 0 nodeid 1 ref 1 
grant queue 
0002023c gr 5 rq -1 flg 8 sts 2 node 1 remid 40117 lq 0,8 
name "       3         56d68eb" flags 4 nodeid 0 ref 3 
grant queue 
0002007c gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
000501b5 gr 5 rq -1 flg 2008 sts 2 node 2 remid 3011a lq 0,8 
000401d0 gr 0 rq -1 flg 2008 sts 2 node 6 remid 90288 lq 0,c 
name "       3         569695f" flags 4 nodeid 0 ref 4 
grant queue 
000102f5 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
00010306 gr 5 rq -1 flg 2008 sts 2 node 2 remid 203fc lq 0,8 
00010010 gr 0 rq -1 flg 2008 sts 2 node 6 remid 703db lq 0,c 
000303b1 gr 0 rq -1 flg 2008 sts 2 node 1 remid 201cc lq 0,c 
name "       5              7d" flags 0 nodeid 1 ref 1 
grant queue 
00010378 gr 3 rq -1 flg 0 sts 2 node 1 remid 1009a lq 0,0 
name "       3         d854494" flags 0 nodeid 6 ref 1 
grant queue 
000902e1 gr 5 rq -1 flg 8 sts 2 node 6 remid 80324 lq 0,8 
name "       3         5626a2a" flags 4 nodeid 0 ref 3 
grant queue 
000100cd gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
0001003e gr 5 rq -1 flg 2008 sts 2 node 4 remid 100a0 lq 0,1c 
000301bf gr 0 rq -1 flg 2008 sts 2 node 6 remid 102a3 lq 0,c 
name "       3         821e0e8" flags 0 nodeid 4 ref 1 
grant queue 
0002030f gr 0 rq -1 flg 8 sts 2 node 4 remid 50358 lq 0,c 
name "       3          100491" flags 0 nodeid 1 ref 1 
grant queue 
00050034 gr 0 rq -1 flg 8 sts 2 node 1 remid 8037c lq 0,c 
name "       3         ac893fd" flags 0 nodeid 5 ref 1 
grant queue 
0003037b gr 5 rq -1 flg 8 sts 2 node 5 remid 3024d lq 0,8 
name "       3         81ae1b3" flags 0 nodeid 4 ref 1 
grant queue 
0002008a gr 0 rq -1 flg 8 sts 2 node 4 remid 203cc lq 0,c 
name "       3           605b3" flags 0 nodeid 1 ref 1 
grant queue 
0002024c gr 5 rq -1 flg 8 sts 2 node 1 remid 200e8 lq 0,8 
name "       3         56569d3" flags 4 nodeid 0 ref 4 
grant queue 
00010125 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
00050305 gr 5 rq -1 flg 2008 sts 2 node 1 remid 701c6 lq 0,8 
0002033f gr 0 rq -1 flg 2008 sts 2 node 6 remid 500a2 lq 0,c 
000203dc gr 0 rq -1 flg 2008 sts 2 node 2 remid 20157 lq 0,c 
name "       3           e04cb" flags 0 nodeid 1 ref 1 
grant queue 
000603c8 gr 5 rq -1 flg 8 sts 2 node 1 remid 500a7 lq 0,8 
name "       3         ad392be" flags 0 nodeid 5 ref 1 
grant queue 
00020173 gr 5 rq -1 flg 8 sts 2 node 5 remid 40224 lq 0,8 
name "       3         819e1d0" flags 0 nodeid 4 ref 1 
grant queue 
00020312 gr 0 rq -1 flg 8 sts 2 node 4 remid 10126 lq 0,c 
name "       5              16" flags 0 nodeid 1 ref 1 
grant queue 
000102a7 gr 3 rq -1 flg 0 sts 2 node 1 remid 102b0 lq 0,0 
name "       3         ad492a1" flags 0 nodeid 5 ref 1 
grant queue 
000700cd gr 0 rq -1 flg 8 sts 2 node 5 remid 501ce lq 0,c 
name "       5              18" flags 0 nodeid 1 ref 1 
grant queue 
000100a7 gr 3 rq -1 flg 0 sts 2 node 1 remid 20024 lq 0,0 
name "       3         817e20a" flags 0 nodeid 4 ref 1 
grant queue 
00020313 gr 0 rq -1 flg 8 sts 2 node 4 remid 20163 lq 0,c 
name "       3         56b6925" flags 4 nodeid 0 ref 3 
grant queue 
000200bd gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
00020164 gr 0 rq -1 flg 2008 sts 2 node 6 remid 702eb lq 0,c 
000102fc gr 0 rq -1 flg 2008 sts 2 node 2 remid 2018f lq 0,c 
name "       3         ada91f3" flags 0 nodeid 5 ref 1 
grant queue 
000400cf gr 0 rq -1 flg 8 sts 2 node 5 remid 803d8 lq 0,c 
name "       3         acd936c" flags 0 nodeid 5 ref 1 
grant queue 
000302e7 gr 0 rq -1 flg 8 sts 2 node 5 remid 20054 lq 0,c 
name "       3         d7c4599" flags 0 nodeid 6 ref 1 
grant queue 
0002002f gr 0 rq -1 flg 8 sts 2 node 6 remid 40157 lq 0,c 
name "       3           3060a" flags 0 nodeid 1 ref 1 
grant queue 
0002010f gr 5 rq -1 flg 8 sts 2 node 1 remid 2004e lq 0,8 
name "       3         825e074" flags 0 nodeid 4 ref 1 
grant queue 
00040155 gr 5 rq -1 flg 8 sts 2 node 4 remid 5010c lq 0,8 
name "       2              7d" flags 0 nodeid 1 ref 1 
grant queue 
00040004 gr 3 rq -1 flg 0 sts 2 node 1 remid 600bd lq 0,0 
name "       3         815e244" flags 0 nodeid 4 ref 1 
grant queue 
00040050 gr 5 rq -1 flg 8 sts 2 node 4 remid 100d9 lq 0,8 
name "       3         2beb6be" flags 0 nodeid 2 ref 1 
grant queue 
0002012d gr 0 rq -1 flg 8 sts 2 node 2 remid 2015e lq 0,c 
name "       3         818e1ed" flags 0 nodeid 4 ref 1 
grant queue 
000100d5 gr 5 rq -1 flg 8 sts 2 node 4 remid 20249 lq 0,8 
name "       2              1a" flags 0 nodeid 1 ref 1 
grant queue 
00010127 gr 3 rq -1 flg 0 sts 2 node 1 remid 10191 lq 0,0 
name "       5         d77462f" flags 0 nodeid 6 ref 1 
grant queue 
0001026f gr 3 rq -1 flg 0 sts 2 node 6 remid 1020c lq 0,1 
name "       3         81fe122" flags 0 nodeid 4 ref 1 
grant queue 
00040011 gr 5 rq -1 flg 8 sts 2 node 4 remid 3035e lq 0,8 
name "       3         568697c" flags 4 nodeid 0 ref 3 
grant queue 
000403b8 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
0001016d gr 5 rq -1 flg 2008 sts 2 node 6 remid 50142 lq 0,8 
00020359 gr 0 rq -1 flg 2008 sts 2 node 1 remid 4037e lq 0,c 
name "       3         5716877" flags 4 nodeid 0 ref 2 
grant queue 
0001007e gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
00050112 gr 5 rq -1 flg 2008 sts 2 node 6 remid a0143 lq 0,8 
name "       5         2b0b859" flags 0 nodeid 6 ref 1 
grant queue 
00010179 gr 3 rq -1 flg 0 sts 2 node 6 remid 103da lq 0,1 
name "       3         56669b6" flags 4 nodeid 0 ref 3 
grant queue 
00020035 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
00010045 gr 0 rq -1 flg 2008 sts 2 node 2 remid 70257 lq 0,c 
0001030e gr 0 rq -1 flg 2008 sts 2 node 6 remid 1019e lq 0,c 
name "       3         ad8922d" flags 0 nodeid 5 ref 1 
grant queue 
0005003d gr 5 rq -1 flg 8 sts 2 node 5 remid 3033f lq 0,8 
name "       3         2b4b7e0" flags 0 nodeid 2 ref 1 
grant queue 
00030190 gr 0 rq -1 flg 8 sts 2 node 2 remid 30003 lq 0,c 
name "       8             3e9" flags 4 nodeid 0 ref 6 
grant queue 
0005006c gr 3 rq -1 flg 2008 sts 2 node 4 remid 50017 lq 0,1c 
00070009 gr 3 rq -1 flg 2008 sts 2 node 1 remid 40380 lq 0,1c 
000402fa gr 3 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
000101d7 gr 3 rq -1 flg 2008 sts 2 node 2 remid 10196 lq 0,1c 
0001011f gr 3 rq -1 flg 2008 sts 2 node 6 remid 10316 lq 0,1c 
000202a3 gr 3 rq -1 flg 2008 sts 2 node 5 remid 503c9 lq 0,1c 
name "       3         816e227" flags 0 nodeid 4 ref 1 
grant queue 
000201a7 gr 5 rq -1 flg 8 sts 2 node 4 remid 100a4 lq 0,8 
name "       3         826e057" flags 0 nodeid 4 ref 1 
grant queue 
00080040 gr 5 rq -1 flg 8 sts 2 node 4 remid 603df lq 0,8 
name "       3         56a6942" flags 4 nodeid 0 ref 3 
grant queue 
00020277 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
00020133 gr 5 rq -1 flg 2008 sts 2 node 6 remid 4014d lq 0,1c 
000101d1 gr 0 rq -1 flg 2008 sts 2 node 2 remid 303cd lq 0,c 
name "       3         d864477" flags 0 nodeid 6 ref 1 
grant queue 
0002005c gr 0 rq -1 flg 8 sts 2 node 6 remid c02be lq 0,c 
name "       3           405ed" flags 0 nodeid 1 ref 1 
grant queue 
00010054 gr 5 rq -1 flg 8 sts 2 node 1 remid 10214 lq 0,8 
name "       1               2" flags 0 nodeid 1 ref 1 
grant queue 
00010124 gr 3 rq -1 flg 0 sts 2 node 1 remid 101c3 lq 0,0 
name "       3           f04ae" flags 0 nodeid 1 ref 1 
grant queue 
0004011c gr 5 rq -1 flg 8 sts 2 node 1 remid b02ee lq 0,8 
name "       3         56e68ce" flags 4 nodeid 0 ref 3 
grant queue 
000401e1 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
0006001a gr 5 rq -1 flg 2008 sts 2 node 6 remid 90312 lq 0,1c 
00040045 gr 0 rq -1 flg 2008 sts 2 node 2 remid 201ac lq 0,c 
name "       3         2b3b7fd" flags 0 nodeid 2 ref 1 
grant queue 
000200f6 gr 0 rq -1 flg 8 sts 2 node 2 remid 300ff lq 0,c 
name "       4         8145c00" flags 4 nodeid 0 ref 1 
grant queue 
0001026b gr 5 rq -1 flg 0 sts 2 node 0 remid 0 lq 0,0 
name "       3         828e01d" flags 0 nodeid 4 ref 1 
grant queue 
0002032c gr 0 rq -1 flg 8 sts 2 node 4 remid 202f3 lq 0,c 
name "       6              7d" flags 0 nodeid 5 ref 2 
wait queue 
000601db gr -1 rq 5 flg 0 sts 1 node 5 remid 0 lq 3,1 
name "       2              19" flags 0 nodeid 1 ref 1 
grant queue 
000102f9 gr 3 rq -1 flg 0 sts 2 node 1 remid 10038 lq 0,0 
name "       5         814dc05" flags 0 nodeid 6 ref 1 
grant queue 
0001023e gr 3 rq -1 flg 0 sts 2 node 6 remid 10222 lq 0,1 
name "       2              17" flags 0 nodeid 1 ref 1 
grant queue 
00010217 gr 3 rq -1 flg 0 sts 2 node 1 remid 10224 lq 0,0 
name "       2         2b0b859" flags 0 nodeid 6 ref 1 
grant queue 
0001015d gr 3 rq -1 flg 0 sts 2 node 6 remid 20030 lq 0,1 
name "       3         ad292db" flags 0 nodeid 5 ref 1 
grant queue 
000203f6 gr 5 rq -1 flg 8 sts 2 node 5 remid 40080 lq 0,8 
name "       3         2bab732" flags 0 nodeid 2 ref 1 
grant queue 
000402b1 gr 0 rq -1 flg 8 sts 2 node 2 remid 10282 lq 0,c 
name "       3         d8244eb" flags 0 nodeid 6 ref 1 
grant queue 
000202bc gr 0 rq -1 flg 8 sts 2 node 6 remid 40281 lq 0,c 
name "       3         d7b45b6" flags 0 nodeid 6 ref 1 
grant queue 
00020063 gr 5 rq -1 flg 8 sts 2 node 6 remid 40020 lq 0,8 
name "       2         d77462f" flags 0 nodeid 6 ref 1 
grant queue 
0002021d gr 3 rq -1 flg 0 sts 2 node 6 remid 70333 lq 0,0 
name "       3         ac993e0" flags 0 nodeid 5 ref 1 
grant queue 
0002034b gr 5 rq -1 flg 8 sts 2 node 5 remid 10191 lq 0,8 
name "       3         d814508" flags 0 nodeid 6 ref 1 
grant queue 
000401b5 gr 5 rq -1 flg 8 sts 2 node 6 remid b0129 lq 0,8 
name "       3         56f68b1" flags 4 nodeid 0 ref 3 
grant queue 
000603e2 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
00040204 gr 0 rq -1 flg 2008 sts 2 node 1 remid 902bb lq 0,c 
00040275 gr 0 rq -1 flg 2008 sts 2 node 6 remid 200ac lq 0,c 
name "       3         56469f0" flags 4 nodeid 0 ref 2 
grant queue 
000101d0 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 
000103e6 gr 0 rq -1 flg 2008 sts 2 node 2 remid 201c2 lq 0,c 
name "       3           9055c" flags 0 nodeid 1 ref 1 
grant queue 
00050189 gr 5 rq -1 flg 8 sts 2 node 1 remid b017c lq 0,8 
name "       3         d8444b1" flags 0 nodeid 6 ref 1 
grant queue 
0002001d gr 5 rq -1 flg 8 sts 2 node 6 remid b02be lq 0,8 
name "       3         56c6908" flags 4 nodeid 0 ref 3 
grant queue 
00020377 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
000400de gr 5 rq -1 flg 2008 sts 2 node 2 remid 201cb lq 0,8 
00030347 gr 0 rq -1 flg 2008 sts 2 node 6 remid 503a7 lq 0,c 
name "       3         2bdb6db" flags 0 nodeid 2 ref 1 
grant queue 
000301cd gr 5 rq -1 flg 8 sts 2 node 2 remid 40177 lq 0,8 
name "       3         2b9b74f" flags 0 nodeid 2 ref 1 
grant queue 
00010379 gr 0 rq -1 flg 8 sts 2 node 2 remid 2011e lq 0,c 
name "       3         d78460d" flags 0 nodeid 6 ref 1 
grant queue 
00020394 gr 5 rq -1 flg 8 sts 2 node 6 remid 200ff lq 0,8 
name "       5              19" flags 0 nodeid 1 ref 1 
grant queue 
00010146 gr 3 rq -1 flg 0 sts 2 node 1 remid 10250 lq 0,0 
name "       2         814dc05" flags 0 nodeid 6 ref 1 
grant queue 
000300f7 gr 3 rq -1 flg 0 sts 2 node 6 remid 60166 lq 0,0 
name "       3           70596" flags 0 nodeid 1 ref 1 
grant queue 
000100e6 gr 0 rq -1 flg 8 sts 2 node 1 remid 40010 lq 0,c 
name "       5              17" flags 0 nodeid 1 ref 1 
grant queue 
00010288 gr 3 rq -1 flg 0 sts 2 node 1 remid 1014b lq 0,0 
name "       3         81ee13f" flags 0 nodeid 4 ref 1 
grant queue 
0002036e gr 5 rq -1 flg 8 sts 2 node 4 remid 20072 lq 0,8 
name "       3         5636a0d" flags 4 nodeid 0 ref 3 
grant queue 
000101ac gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 
00020047 gr 5 rq -1 flg 2008 sts 2 node 2 remid 1009b lq 0,8 
00020343 gr 0 rq -1 flg 2008 sts 2 node 4 remid 102d4 lq 0,c 
name 
"xkJyBKlyJsdHbujW5QyCE5OELvGiFwiaFKRGYHfbv1W0Wa71NWtV0zzmuAVi6QfQ" 
flags 0 nodeid 2 ref 1 
grant queue 
00010268 gr 1 rq -1 flg 0 sts 2 node 2 remid 101fe lq 0,1 
al to 5 
gfs0 send einval to 5 
gfs0 send einval to 5 
gfs0 send einval to 5 
gfs0 rq 5 40187 "       6         d77462f" 
gfs0 send lu 40187 to 6 
gfs0 lu rep 40187 fr 6 2 
gfs0 send rq 40187 to 2 
gfs0 un 402d5 ref 1 flg 0 nodeid 1/-1 "       2              7d 
gfs0 send un 402d5 to 1 
gfs0 rq 3 300f7 "       2         814dc05" 
gfs0 send lu 300f7 to 6 
gfs0 lu rep 300f7 fr 6 6 
gfs0 send rq 300f7 to 6 
gfs0 rq 5 40098 "       6         814dc05" 
gfs0 send lu 40098 to 6 
gfs0 lu rep 40098 fr 6 1 
gfs0 send rq 40098 to 1 
gfs0 rq 5 30184 "       6         d77462f" 
gfs0 send lu 30184 to 6 
gfs0 lu rep 30184 fr 6 2 
gfs0 send rq 30184 to 2 
gfs0 rq 5 50105 "       6         d77462f" 
gfs0 send lu 50105 to 6 
gfs0 lu rep 50105 fr 6 2 
gfs0 send rq 50105 to 2 
gfs0 rq 3 40004 "       2              7d" 
gfs0 send lu 40004 to 2 
gfs0 lu rep 40004 fr 2 1 
gfs0 send rq 40004 to 1 
gfs0 rq 5 601db "       6              7d" 
gfs0 send lu 601db to 6 
gfs0 lu rep 601db fr 6 5 
gfs0 send rq 601db to 5 
gfs0 rq 5 from 4 4035e "       6              7d" 
 
DLM:  Assertion failed on line 584 of file cluster/dlm/locking.c 
DLM:  assertion:  "rsb->res_nodeid == 0" 
DLM:  time = 2690348 
dlm: lkb 
id 4035e 
remid 80294 
flags 2000 
status 0 
rqmode 5 
grmode -1 
nodeid 4 
lqstate 0 
lqflags 1 
dlm: request 
rh_cmd 2 
rh_lkid 80294 
remlkid 0 
flags 1 
status 0 
rqmode 5 
nodeid 4 
 
------------[ cut here ]------------ 
kernel BUG at cluster/dlm/locking.c:584! 
invalid operand: 0000 [#1] 
Modules linked in: gfs lock_dlm dlm cman lock_harness ipv6 
parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode 
dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8c37b9e>]    Not tainted 
EFLAGS: 00010282   (2.6.7) 
EIP is at remote_stage2+0x21e/0x240 [dlm] 
eax: 00000001   ebx: c7976810   ecx: 00000000   edx: f7435e04 
esi: 00000005   edi: c23b6000   ebp: f7cf4138   esp: f7435e00 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 2453, threadinfo=f7434000 task=f74390b0) 
Stack: f8c46f17 00000004 f8c46f01 f8c46f41 00290d2c c79a4a05 
00000000 00000004 
       c79a4990 00000000 f7cf4138 c23b6000 00000004 f8c3a2e0 
f745d444 f7434000 
       00000001 00000000 00000067 f745d6e0 00000067 f745d5cc 
f7435f94 00000000 
Call Trace: 
 [<f8c3a2e0>] process_cluster_request+0x160/0xd40 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8c3e313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c0136af3>] __alloc_pages+0x2f3/0x340 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8c3c011>] receive_from_sock+0x141/0x310 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8c3cec7>] process_sockets+0x57/0x80 [dlm] 
 [<f8c3d13e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<f8c3d0a0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
 
 

Comment 11 David Teigland 2004-07-21 10:51:00 UTC
I reproduced this today and have checked in the fix.  A slight
modification was needed to the recent change that addressed
the data error.

Comment 12 Dean Jansa 2004-07-21 18:22:48 UTC
Did you check this into cvs?  I just built this morning and  
hit: 
 
GFS <CVS> (built Jul 21 2004 09:56:16) installed 
CMAN <CVS> (built Jul 21 2004 10:17:04) installed 
DLM <CVS> (built Jul 21 2004 10:17:17) installed 
Lock_DLM (built Jul 21 2004 10:27:08) installed 
 
tank-04: 
------------[ cut here ]------------ 
kernel BUG at cluster/dlm/locking.c:584! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<f8a70b9e>]    Not tainted 
EFLAGS: 00010282   (2.6.7) 
EIP is at remote_stage2+0x21e/0x240 [dlm] 
eax: 00000001   ebx: f509257c   ecx: 00000000   edx: f74b5e04 
esi: 00000004   edi: f7355000   ebp: f7f77738   esp: f74b5e00 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3582, threadinfo=f74b4000 task=c23ff330) 
Stack: f8a7ff17 00000001 f8a7ff01 f8a7ff41 005f4d69 f50e2f39 
00000000 00000001 
       f50e2ec4 00000000 f7f77738 f7355000 00000001 f8a732e0 
f76eac44 f74b4000 
       00000001 00000000 000000b7 f76eaee0 000000b7 f76eadcc 
f74b5f94 00000000 
Call Trace: 
 [<f8a732e0>] process_cluster_request+0x160/0xd40 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a77313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<c0136af3>] __alloc_pages+0x2f3/0x340 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<f8a75011>] receive_from_sock+0x141/0x310 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8a75ec7>] process_sockets+0x57/0x80 [dlm] 
 [<f8a7613e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<f8a760a0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 48 02 01 ff a7 f8 c7 04 24 c4 0f a8 f8 e8 3e a5 6a c7 
 <6>CMAN: killed by STARTTRANS or NOMINATE 
CMAN: we are leaving the cluster 
SM: send_nodeid_message error -107 to 1 
SM: send_nodeid_message error -107 to 4 
SM: send_nodeid_message error -107 to 3 
SM: send_broadcast_message error -107 
SM: send_broadcast_message error -107 
0 5,5 id 1f03da sts -65538 
un 11,d774630 id 13011c cur 5 0 
qc 11,d774630 5,5 id 13011c sts -65538 
ex punlock 3751 error 0 
en plock 3751 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 1c03ce sts 0 
req 7,d774630 ex 0-7fffffffffffffff 3751 w 1 
lk 7,d774630 id 0 -1,5 0 
un 11,d774630 id 1c03ce cur 5 0 
qc 11,d774630 5,5 id 1c03ce sts -65538 
lk 2,d774630 id 2c0133 5,3 45 
qc 2,d774630 5,3 id 2c0133 sts 0 
un 2,d774630 id 2c0133 cur 3 0 
qc 2,d774630 3,3 id 2c0133 sts -65538 
qc 7,d774630 -1,5 id 1e017d sts 0 
ex plock 3751 error 0 
lk 2,d774630 id 0 -1,3 0 
qc 2,d774630 -1,3 id 1d03ac sts 0 
lk 2,d774630 id 1d03ac 3,5 54 
qc 2,d774630 3,5 id 1d03ac sts 0 
en punlock 3751 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 15008a sts 0 
remove 7,d774630 3751 
un 7,d774630 id 1e017d cur 5 0 
qc 7,d774630 5,5 id 1e017d sts -65538 
un 11,d774630 id 15008a cur 5 0 
qc 11,d774630 5,5 id 15008a sts -65538 
ex punlock 3751 error 0 
lk 2,ac59459 id 0 -1,3 0 
lk 2,d774630 id 1d03ac 5,3 45 
qc 2,d774630 5,3 id 1d03ac sts 0 
qc 2,ac59459 -1,3 id 28024c sts 0 
en plock 3751 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 1b0366 sts 0 
req 7,ac59459 ex 0-7fffffffffffffff 3751 w 1 
lk 7,ac59459 id 0 -1,5 0 
un 11,ac59459 id 1b0366 cur 5 0 
qc 11,ac59459 5,5 id 1b0366 sts -65538 
qc 7,ac59459 -1,5 id 230026 sts 0 
ex plock 3751 error 0 
lk 2,ac59459 id 28024c 3,5 54 
qc 2,ac59459 3,5 id 28024c sts 0 
en punlock 3751 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 1e0078 sts 0 
remove 7,ac59459 3751 
un 7,ac59459 id 230026 cur 5 0 
qc 7,ac59459 5,5 id 230026 sts -65538 
un 11,ac59459 id 1e0078 cur 5 0 
qc 11,ac59459 5,5 id 1e0078 sts -65538 
ex punlock 3751 error 0 
en plock 3751 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 2f02a1 sts 0 
req 7,ac59459 ex 0-7fffffffffffffff 3751 w 1 
lk 7,ac59459 id 0 -1,5 0 
un 11,ac59459 id 2f02a1 cur 5 0 
qc 11,ac59459 5,5 id 2f02a1 sts -65538 
qc 7,ac59459 -1,5 id 2001e9 sts 0 
ex plock 3751 error 0 
en punlock 3751 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 1a01a0 sts 0 
remove 7,ac59459 3751 
un 7,ac59459 id 2001e9 cur 5 0 
qc 7,ac59459 5,5 id 2001e9 sts -65538 
un 11,ac59459 id 1a01a0 cur 5 0 
qc 11,ac59459 5,5 id 1a01a0 sts -65538 
ex punlock 3751 error 0 
en plock 3751 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 140110 sts 0 
req 7,d774630 ex 0-7fffffffffffffff 3751 w 1 
lk 7,d774630 id 0 -1,5 0 
un 11,d774630 id 140110 cur 5 0 
qc 11,d774630 5,5 id 140110 sts -65538 
un 2,d774630 id 1d03ac cur 3 0 
qc 2,d774630 3,3 id 1d03ac sts -65538 
lk 2,ac59459 id 28024c 5,3 45 
qc 2,ac59459 5,3 id 28024c sts 0 
un 2,ac59459 id 28024c cur 3 0 
qc 2,ac59459 3,3 id 28024c sts -65538 
qc 7,d774630 -1,5 id 210023 sts 0 
ex plock 3751 error 0 
lk 2,d774630 id 0 -1,3 0 
qc 2,d774630 -1,3 id 1a035d sts 0 
lk 2,d774630 id 1a035d 3,5 54 
qc 2,d774630 3,5 id 1a035d sts 0 
lk 2,d774630 id 1a035d 5,3 45 
qc 2,d774630 5,3 id 1a035d sts 0 
en punlock 3751 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 24004a sts 0 
remove 7,d774630 3751 
un 7,d774630 id 210023 cur 5 0 
qc 7,d774630 5,5 id 210023 sts -65538 
un 11,d774630 id 24004a cur 5 0 
qc 11,d774630 5,5 id 24004a sts -65538 
ex punlock 3751 error 0 
lk 2,ac59459 id 0 -1,3 0 
qc 2,ac59459 -1,3 id 180353 sts 0 
en plock 3751 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 22024f sts 0 
req 7,ac59459 ex 0-7fffffffffffffff 3751 w 1 
lk 7,ac59459 id 0 -1,5 0 
un 11,ac59459 id 22024f cur 5 0 
qc 11,ac59459 5,5 id 22024f sts -65538 
qc 7,ac59459 -1,5 id 200385 sts 0 
ex plock 3751 error 0 
lk 2,ac59459 id 180353 3,5 54 
qc 2,ac59459 3,5 id 180353 sts 0 
en punlock 3751 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 1b00cb sts 0 
remove 7,ac59459 3751 
un 7,ac59459 id 200385 cur 5 0 
qc 7,ac59459 5,5 id 200385 sts -65538 
un 11,ac59459 id 1b00cb cur 5 0 
qc 11,ac59459 5,5 id 1b00cb sts -65538 
ex punlock 3751 error 0 
en plock 3751 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 1a00de sts 0 
req 7,d774630 ex 0-7fffffffffffffff 3751 w 1 
lk 7,d774630 id 0 -1,5 0 
un 11,d774630 id 1a00de cur 5 0 
qc 11,d774630 5,5 id 1a00de sts -65538 
lk 8,3e8 id 170285 3,5 5c 
 
lock_dlm:  Assertion failed on line 388 of file 
/usr/src/cluster/gfs-kernel/src/dlm/lock.c 
lock_dlm:  assertion:  "!error" 
lock_dlm:  time = 6251402 
gfs0: num=8,3e8 err=-22 cur=3 req=5 lkf=5c 
 
Kernel panic: lock_dlm:  Record message above and reboot. 
 
 
tank-06: 
------------[ cut here ]------------ 
kernel BUG at cluster/dlm/locking.c:584! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy 
sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd 
dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod 
CPU:    0 
EIP:    0060:[<e02f6b9e>]    Not tainted 
EFLAGS: 00010282   (2.6.7) 
EIP is at remote_stage2+0x21e/0x240 [dlm] 
eax: 00000001   ebx: d80081e0   ecx: 00000000   edx: da37be04 
esi: 00000004   edi: dd3dd000   ebp: daf06d38   esp: da37be00 
ds: 007b   es: 007b   ss: 0068 
Process dlm_recvd (pid: 3597, threadinfo=da37a000 task=da33f3b0) 
Stack: e0305f17 00000006 e0305f01 e0305f41 006204c7 d80198dd 
00000000 00000006 
       d8019868 00000000 daf06d38 dd3dd000 00000006 e02f92e0 
da46fc44 da37a000 
       00000001 00000000 00000067 da46fee0 00000067 da46fdcc 
da37bf94 00000000 
Call Trace: 
 [<e02f92e0>] process_cluster_request+0x160/0xd40 [dlm] 
 [<c02b0e98>] inet_recvmsg+0x48/0x70 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<e02fd313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] 
 [<e00bba68>] scsi_softirq+0xa8/0xd0 [scsi_mod] 
 [<c0136af3>] __alloc_pages+0x2f3/0x340 
 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 
 [<e02fb011>] receive_from_sock+0x141/0x310 [dlm] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<e02fbec7>] process_sockets+0x57/0x80 [dlm] 
 [<e02fc13e>] dlm_recvd+0x9e/0xf0 [dlm] 
 [<e02fc0a0>] dlm_recvd+0x0/0xf0 [dlm] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 48 02 01 5f 30 e0 c7 04 24 c4 6f 30 e0 e8 3e 45 e2 df 
 <6>CMAN: killed by STARTTRANS or NOMINATE 
CMAN: we are leaving the cluster 
SM: send_nodeid_message error -107 to 4 
SM: send_nodeid_message error -107 to 3 
SM: send_nodeid_message error -107 to 6 
SM: send_broadcast_message error -107 
SM: send_broadcast_message error -107 
SM: send_broadcast_message error -107 
id 3c0188 sts 0 
req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 
lk 7,d77462f id 0 -1,5 0 
un 11,d77462f id 3c0188 cur 5 0 
qc 7,d77462f -1,5 id 4000a2 sts 0 
qc 11,d77462f 5,5 id 3c0188 sts -65538 
ex plock 3768 error 0 
lk 2,d77462f id 4b0396 3,5 54 
qc 2,d77462f 3,5 id 4b0396 sts 0 
en punlock 3768 7,d77462f 
lk 11,d77462f id 0 -1,5 0 
qc 11,d77462f -1,5 id 4d0395 sts 0 
remove 7,d77462f 3768 
un 7,d77462f id 4000a2 cur 5 0 
qc 7,d77462f 5,5 id 4000a2 sts -65538 
un 11,d77462f id 4d0395 cur 5 0 
qc 11,d77462f 5,5 id 4d0395 sts -65538 
ex punlock 3768 error 0 
lk 2,ac59459 id 0 -1,3 0 
qc 2,ac59459 -1,3 id 3a0133 sts 0 
un 2,ac59459 id 3a0133 cur 3 0 
qc 2,ac59459 3,3 id 3a0133 sts -65538 
lk 2,ac59459 id 0 -1,3 0 
qc 2,ac59459 -1,3 id 3f002a sts 0 
en plock 3768 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 3c027c sts 0 
req 7,ac59459 ex 0-7fffffffffffffff 3768 w 1 
lk 7,ac59459 id 0 -1,5 0 
un 11,ac59459 id 3c027c cur 5 0 
qc 7,ac59459 -1,5 id 51012d sts 0 
qc 11,ac59459 5,5 id 3c027c sts -65538 
ex plock 3768 error 0 
lk 2,ac59459 id 3f002a 3,5 54 
qc 2,ac59459 3,5 id 3f002a sts 0 
lk 2,d77462f id 4b0396 5,3 45 
qc 2,d77462f 5,3 id 4b0396 sts 0 
en punlock 3768 7,ac59459 
lk 11,ac59459 id 0 -1,5 0 
qc 11,ac59459 -1,5 id 4a01e3 sts 0 
remove 7,ac59459 3768 
un 7,ac59459 id 51012d cur 5 0 
qc 7,ac59459 5,5 id 51012d sts -65538 
un 11,ac59459 id 4a01e3 cur 5 0 
qc 11,ac59459 5,5 id 4a01e3 sts -65538 
ex punlock 3768 error 0 
en plock 3768 7,d77462f 
lk 11,d77462f id 0 -1,5 0 
qc 11,d77462f -1,5 id 4803b5 sts 0 
req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 
lk 7,d77462f id 0 -1,5 0 
un 11,d77462f id 4803b5 cur 5 0 
qc 7,d77462f -1,5 id 3f00a5 sts 0 
qc 11,d77462f 5,5 id 4803b5 sts -65538 
ex plock 3768 error 0 
lk 2,d77462f id 4b0396 3,5 54 
qc 2,d77462f 3,5 id 4b0396 sts 0 
en punlock 3768 7,d77462f 
lk 11,d77462f id 0 -1,5 0 
lk 2,d77462f id 4b0396 5,3 45 
qc 2,d77462f 5,3 id 4b0396 sts 0 
qc 11,d77462f -1,5 id 4a035d sts 0 
remove 7,d77462f 3768 
un 7,d77462f id 3f00a5 cur 5 0 
qc 7,d77462f 5,5 id 3f00a5 sts -65538 
un 11,d77462f id 4a035d cur 5 0 
qc 11,d77462f 5,5 id 4a035d sts -65538 
ex punlock 3768 error 0 
en plock 3768 7,d77462f 
lk 11,d77462f id 0 -1,5 0 
qc 11,d77462f -1,5 id 390166 sts 0 
req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 
lk 7,d77462f id 0 -1,5 0 
un 11,d77462f id 390166 cur 5 0 
qc 11,d77462f 5,5 id 390166 sts -65538 
un 2,d77462f id 4b0396 cur 3 0 
qc 2,d77462f 3,3 id 4b0396 sts -65538 
qc 7,d77462f -1,5 id 3f0165 sts 0 
ex plock 3768 error 0 
lk 2,d77462f id 0 -1,3 0 
qc 2,d77462f -1,3 id 460156 sts 0 
lk 2,d77462f id 460156 3,5 54 
qc 2,d77462f 3,5 id 460156 sts 0 
lk 2,ac59459 id 3f002a 5,3 45 
qc 2,ac59459 5,3 id 3f002a sts 0 
un 2,ac59459 id 3f002a cur 3 0 
qc 2,ac59459 3,3 id 3f002a sts -65538 
lk 2,d77462f id 460156 5,3 45 
qc 2,d77462f 5,3 id 460156 sts 0 
en punlock 3768 7,d77462f 
lk 11,d77462f id 0 -1,5 0 
qc 11,d77462f -1,5 id 490292 sts 0 
remove 7,d77462f 3768 
un 7,d77462f id 3f0165 cur 5 0 
qc 7,d77462f 5,5 id 3f0165 sts -65538 
un 11,d77462f id 490292 cur 5 0 
qc 11,d77462f 5,5 id 490292 sts -65538 
ex punlock 3768 error 0 
lk 2,d774630 id 0 -1,3 0 
un 2,d77462f id 460156 cur 3 0 
qc 2,d77462f 3,3 id 460156 sts -65538 
qc 2,d774630 -1,3 id 370067 sts 0 
en plock 3768 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 47023d sts 0 
req 7,d774630 ex 0-7fffffffffffffff 3768 w 1 
lk 7,d774630 id 0 -1,5 0 
un 11,d774630 id 47023d cur 5 0 
qc 7,d774630 -1,5 id 3e031c sts 0 
qc 11,d774630 5,5 id 47023d sts -65538 
ex plock 3768 error 0 
lk 2,d774630 id 370067 3,5 54 
qc 2,d774630 3,5 id 370067 sts 0 
en punlock 3768 7,d774630 
lk 11,d774630 id 0 -1,5 0 
qc 11,d774630 -1,5 id 4903a0 sts 0 
remove 7,d774630 3768 
un 7,d774630 id 3e031c cur 5 0 
qc 7,d774630 5,5 id 3e031c sts -65538 
un 11,d774630 id 4903a0 cur 5 0 
qc 11,d774630 5,5 id 4903a0 sts -65538 
ex punlock 3768 error 0 
lk 2,d77462f id 0 -1,3 0 
qc 2,d77462f -1,3 id 4b0227 sts 0 
en plock 3768 7,d77462f 
lk 11,d77462f id 0 -1,5 0 
qc 11,d77462f -1,5 id 48008c sts 0 
req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 
lk 7,d77462f id 0 -1,5 0 
un 11,d77462f id 48008c cur 5 0 
qc 11,d77462f 5,5 id 48008c sts -65538 
lk 8,3e8 id 4803ac 3,5 5c 
 
lock_dlm:  Assertion failed on line 388 of file 
/usr/src/cluster/gfs-kernel/src/dlm/lock.c 
lock_dlm:  assertion:  "!error" 
lock_dlm:  time = 6429423 
gfs0: num=8,3e8 err=-22 cur=3 req=5 lkf=5c 
 
Kernel panic: lock_dlm:  Record message above and reboot. 
 
 

Comment 13 Dean Jansa 2004-07-21 19:21:08 UTC
Scratch the above -- I had a build error on  the cluster tree... 
The modules were installed with the new timestamp, but not 
rebuilt correctly.   
 
Will re-verify once I get the build error figured out... 

Comment 14 Dean Jansa 2004-07-22 14:19:38 UTC
I ran overnight without hitting any assertions... 
 


Note You need to log in before you can comment on or make changes to this bug.