Description of problem: While running accordion I hit file corruption, using both flocks and fcntl locks: With fcntl locks: cmd run on all 6 nodes in my cluster (in the gfs fs): accordion -s 409600 -e 4096 -m 100 accfile1 accfile2 acccfile3 accordion starting: Iterations: 0 Run time: 0s Lock type: fcntl File size: 409600 Extend size: 4096 Random truncate: No Use lseek: No Random seed: 4270 Filelist: ---------------------------------------------------- /mnt/gfs0/accfile1 /mnt/gfs0/accfile2 /mnt/gfs0/acccfile3 accordion (4270) completed 100 operations - 5.79 ops/sec accordion (4270) completed 100 operations - 5.22 ops/sec accordion (4270) completed 100 operations - 5.87 ops/sec accordion (4270) completed 100 operations - 6.86 ops/sec *** DATA COMPARISON ERROR accfile1 *** Corrupt regions follow - unprintable chars are represented as '.' ----------------------------------------------------------------- corrupt bytes starting at file offset 4 1st 32 expected bytes: 70:tank-06:accordion*W:4270:tank 1st 32 actual bytes: 50:tank-04:accordion*W:4250:tank With flocks: accordion -L flock -s 409600 -e 4096 -m 100 accfile1 accfile2 acccfile3 accordion starting: Iterations: 0 Run time: 0s Lock type: flock File size: 409600 Extend size: 4096 Random truncate: No Use lseek: No Random seed: 4327 Filelist: ---------------------------------------------------- /mnt/gfs0/accfile1 /mnt/gfs0/accfile2 /mnt/gfs0/acccfile3 accordion (4327) completed 100 operations - 20.62 ops/sec accordion (4327) completed 100 operations - 16.69 ops/sec accordion (4327) completed 100 operations - 17.01 ops/sec accordion (4327) completed 100 operations - 17.24 ops/sec accordion (4327) completed 100 operations - 15.50 ops/sec *** DATA COMPARISON ERROR accfile1 *** Corrupt regions follow - unprintable chars are represented as '.' ----------------------------------------------------------------- corrupt bytes starting at file offset 3 1st 32 expected bytes: 327:tank-01:accordion*W:4327:tan 1st 32 actual bytes: 246:tank-04:accordion*W:4246:tan child (4327) exited with status 1 Version-Release number of selected component (if applicable): GFS <CVS> (built Jun 17 2004 10:53:57) installed How reproducible: Always Steps to Reproduce: 1. Use cmdlines above, pull seed and add it with a -S flag if you wish to have the same random seed. Not needed, you hit it without it as well. 2. 3. Expected Results: accordion -- Open a file, lock it, optionally tunc the file, write a chunk of data to end of file, write check, unlock, close. Optionally use lseek to extend the file writing only a single byte at the end, check write, unlock and close. The file will never grow larger than the requested size. If the extend of the file was to make it grow larger than requested size, trunc the file and start over. Additional info:
Ok, this is 2.6, I presume? Exactly which kernel?
2.6.7 GFS <CVS> (built Jun 17 2004 10:53:57) installed CMAN V2.0.1 (built Jun 17 2004 11:14:22) installed DLM (built Jun 17 2004 11:14:35) installed Lock_DLM (built Jun 17 2004 10:54:06) installed Lock_Nolock <CVS> (built Jun 17 2004 10:54:17) installed Gulm v6.0.0 (built Jun 17 2004 10:54:14) installed
Ken, FWIW I can hit this with iogen/doio as well. I'm working on narrowing down the needed syscalls, I'll update if I can get it to trip on anything less than the below list. (I should note, I have reproduced it with a single file on the iogen line as well) [root@tank-06 gfs0]# iogen -o -m random -s read,write,readv,writev -t 1b -T1000b 10000b:tfile1 10000b:tfile2 10000b:tfile3 | doio -avk iogen starting up with the following: Out-pipe: stdout Iterations: Infinite Seed: 4728 Offset-Mode: random Overlap Flag: on Mintrans: 512 (1 blocks) Maxtrans: 512000 (1000 blocks) O_RAW/O_SSD Multiple: (Determined by device) Syscalls: read write readv writev Aio completion types: none Flags: buffered sync Test Files: Path Length iou raw iou file (bytes) (bytes) (bytes) type ----------------------------------------------------------------------------- /mnt/gfs0/tfile1 5120000 1 512 regular /mnt/gfs0/tfile2 5120000 1 512 regular /mnt/gfs0/tfile3 5120000 1 512 regular doio ( 4729) 10:55:39 --------------------- *** DATA COMPARISON ERROR *** check_file(/mnt/gfs0/tfile2, 4825886, 100421, M:4729:tank-06:doio*, 20, 0) failed Comparison fd is 3, with open flags 0 Corrupt regions follow - unprintable chars are represented as '.' ----------------------------------------------------------------- corrupt bytes starting at file offset 4825886 1st 32 expected bytes: M:4729:tank-06:doio*M:4729:tank- 1st 32 actual bytes: :doio*K:4710:tank-05:doio*K:4710 Request number 3908 fd 12 is file /mnt/gfs0/tfile2 - open flags are 010001 O_WRONLY,O_SYNC, write done at file offset 4825886 - pattern is M (0115) number of requests is 1, strides per request is 1 i/o byte count = 100421 memory alignment is unaligned syscall: writev(12, (iov on stack), 1)
I wrote a simpler test which reproduced the same effect (using flock running on 4 nodes). It's on homer ~teigland/writeread.c In short it's: for (;;) { sprintf(wbuf, "%s.%u", hostname, i); while (lock_file(fd, lock_type) < 0) ; lseek(fd, 0, SEEK_SET); write(fd, wbuf, len); lseek(fd, 0, SEEK_SET); read(fd, rbuf, len); if (memcmp(wbuf, rbuf, len)) die("memcmp error\n write: %s\n read %s\n", wbuf, rbuf); unlock_file(fd, lock_type); } The problem with this test is that it often causes one of two different panics before running long enough to see a memcmp error. The first panic is in the dlm (process_asts) and the second is in gfs (1879 of glock.c). Who knows how related any of these might be. This is the first reliable way I've found to trigger the dlm ast bug so I'll work on that first. Maybe that'll fix the gfs assertion and just maybe the write/read mismatch.
I just realized I've been doing all this testing without the flock patch applied to my kernel (and I've been using flocks not plocks).
The last comment was wrong. The flock patch is 00001 so I am using it.
After much head scratching and work, Dave and I think we've fixed this properly now.
I attemped to check this fix out, and hit the following while running: accordion -s 409600 -e 4096 -m 100 accfile1 accfile2 acccfile3 tank-01: ul 19 16:47:06 tank-01 kernel: ------------[ cut here ]------------ Jul 19 16:47:06 tank-01 kernel: kernel BUG at cluster/dlm/locking.c:584! Jul 19 16:47:06 tank-01 kernel: invalid operand: 0000 [#1] Jul 19 16:47:06 tank-01 kernel: Modules linked in: gnbd lock_gulm lock_nolock lo ck_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 j bd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod Jul 19 16:47:06 tank-01 kernel: CPU: 0 Jul 19 16:47:06 tank-01 kernel: EIP: 0060:[<f8a72b9e>] Not tainted Jul 19 16:47:06 tank-01 kernel: EFLAGS: 00010282 (2.6.7) Jul 19 16:47:06 tank-01 kernel: EIP is at remote_stage2+0x21e/0x240 [dlm] Jul 19 16:47:06 tank-01 kernel: eax: 00000001 ebx: f2566aa4 ecx: 00000000 edx: f7675e04 Jul 19 16:47:06 tank-01 kernel: esi: 00000001 edi: f599b000 ebp: c335d238 esp: f7675e00 Jul 19 16:47:06 tank-01 kernel: ds: 007b es: 007b ss: 0068 Jul 19 16:47:06 tank-01 kernel: Process dlm_recvd (pid: 3788, threadinfo=f767400 0 task=f76790b0) Jul 19 16:47:06 tank-01 kernel: Stack: f8a81f17 00000006 f8a81f01 f8a81f41 00441 29b f241cf39 00000000 00000006 Jul 19 16:47:06 tank-01 kernel: f241cec4 00000000 c335d238 f599b000 00000 006 f8a752e0 f76b1c44 f7674000 Jul 19 16:47:06 tank-01 kernel: 00000001 00000000 00000067 f76b1ee0 00000 067 f76b1dcc f7675f94 00000000 Jul 19 16:47:06 tank-01 kernel: Call Trace: Jul 19 16:47:06 tank-01 kernel: [<f8a752e0>] process_cluster_request+0x160/0xd4 0 [dlm] Jul 19 16:47:06 tank-01 kernel: [<c02b0e98>] inet_recvmsg+0x48/0x70 Jul 19 16:47:06 tank-01 kernel: [<c026bd0c>] sock_recvmsg+0xbc/0xc0 Jul 19 16:47:06 tank-01 kernel: [<f8a79313>] midcomms_process_incoming_buffer+0 x173/0x250 [dlm] Jul 19 16:47:06 tank-01 kernel: [<c0136af3>] __alloc_pages+0x2f3/0x340 Jul 19 16:47:06 tank-01 kernel: [<c026bd0c>] sock_recvmsg+0xbc/0xc0 Jul 19 16:47:06 tank-01 kernel: [<f8a77011>] receive_from_sock+0x141/0x310 [dlm ] Jul 19 16:47:06 tank-01 kernel: [<c0117e67>] recalc_task_prio+0x97/0x190 Jul 19 16:47:06 tank-01 kernel: [<f8a77ec7>] process_sockets+0x57/0x80 [dlm] Jul 19 16:47:06 tank-01 kernel: [<f8a7813e>] dlm_recvd+0x9e/0xf0 [dlm] Jul 19 16:47:06 tank-01 kernel: [<f8a780a0>] dlm_recvd+0x0/0xf0 [dlm] Jul 19 16:47:06 tank-01 kernel: [<c010429d>] kernel_thread_helper+0x5/0x18 Jul 19 16:47:06 tank-01 kernel: Jul 19 16:47:06 tank-01 kernel: Code: 0f 0b 48 02 01 1f a8 f8 c7 04 24 c4 2f a8 f8 e8 3e 85 6a c7 Jul 19 16:47:06 tank-01 kernel: <6>CMAN: bad generation number 14 in HELLO mess age, expected 10 Jul 19 16:47:06 tank-01 kernel: CMAN: bad generation number 15 in HELLO message, expected 10 tank-02: CMAN: quorum lost, blocking activity dlm: got connection from 4 tank-03: ------------[ cut here ]------------ kernel BUG at cluster/dlm/locking.c:584! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<f8a72b9e>] Not tainted EFLAGS: 00010282 (2.6.7) EIP is at remote_stage2+0x21e/0x240 [dlm] eax: 00000001 ebx: f57f357c ecx: 00000000 edx: f6b25e04 esi: 00000001 edi: f7643000 ebp: f6dfcd38 esp: f6b25e00 ds: 007b es: 007b ss: 0068 Process dlm_recvd (pid: 3712, threadinfo=f6b24000 task=f6b29430) Stack: f8a81f17 00000002 f8a81f01 f8a81f41 004b1232 f57f7bc1 00000000 00000002 f57f7b4c 00000000 f6dfcd38 f7643000 00000002 f8a752e0 f6b50044 f6b24000 00000001 00000000 000000b7 f6b502e0 000000b7 f6b501cc f6b25f94 00000000 Call Trace: [<f8a752e0>] process_cluster_request+0x160/0xd40 [dlm] [<c02b0e98>] inet_recvmsg+0x48/0x70 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8a79313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] [<c0136af3>] __alloc_pages+0x2f3/0x340 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8a77011>] receive_from_sock+0x141/0x310 [dlm] [<c0117e67>] recalc_task_prio+0x97/0x190 [<f8a77ec7>] process_sockets+0x57/0x80 [dlm] [<f8a7813e>] dlm_recvd+0x9e/0xf0 [dlm] [<f8a780a0>] dlm_recvd+0x0/0xf0 [dlm] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: 0f 0b 48 02 01 1f a8 f8 c7 04 24 c4 2f a8 f8 e8 3e 85 6a c7 <4>CMAN: no HELLO from tank-02.lab.msp.redhat.com, removing from the cluster tank-04: ------------[ cut here ]------------ kernel BUG at kernel/timer.c:405! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<c0121b10>] Not tainted EFLAGS: 00010006 (2.6.7) EIP is at cascade+0x40/0x50 eax: f5a65e10 ebx: c03b5a28 ecx: c03b5a28 edx: c03b5a28 esi: c03b59f8 edi: c03b5180 ebp: 0000000e esp: c0367f40 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0366000 task=c0312a40) Stack: 00000000 c03b4ea8 c0367f54 c0367f54 c01220d1 c0367f54 c0367f54 c0122217 00000001 c03b4ea8 0000000a c0314e24 c011e809 00000046 c0364a00 00000000 c011e837 00000000 c01077c5 00000000 c0367fac c0314e24 c0366000 00099100 Call Trace: [<c01220d1>] run_timer_softirq+0xe1/0x150 [<c0122217>] do_timer+0xc7/0xd0 [<c011e809>] __do_softirq+0x79/0x80 [<c011e837>] do_softirq+0x27/0x30 [<c01077c5>] do_IRQ+0xd5/0x110 [<c0105e6c>] common_interrupt+0x18/0x20 [<c0104053>] default_idle+0x23/0x40 [<c01040e4>] cpu_idle+0x34/0x40 [<c03685e2>] start_kernel+0x162/0x1a0 [<c0368330>] unknown_bootoption+0x0/0x120 Code: 0f 0b 95 01 ea e4 2d c0 eb dd 8d b6 00 00 00 00 56 53 83 ec <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing tank-05: CMAN: node tank-03.lab.msp.redhat.com is not responding - removing from the cluster CMAN: quorum lost, blocking activity tank-06: CMAN: quorum lost, blocking activity
Re-ran the same test to see if the above was reproducable: tank-02: ------------[ cut here ]------------ kernel BUG at cluster/dlm/locking.c:584! invalid operand: 0000 [#1] Modules linked in: gfs lock_dlm dlm cman lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<f8c37b9e>] Not tainted EFLAGS: 00010282 (2.6.7) EIP is at remote_stage2+0x21e/0x240 [dlm] eax: 00000001 ebx: f4c8ee40 ecx: 00000000 edx: f7713e04 esi: 00000004 edi: f7589000 ebp: f7cd2a38 esp: f7713e00 ds: 007b es: 007b ss: 0068 Process dlm_recvd (pid: 2317, threadinfo=f7712000 task=f77b57b0) Stack: f8c46f17 00000006 f8c46f01 f8c46f41 000527f5 f4cae315 00000000 00000006 f4cae2a0 00000000 f7cd2a38 f7589000 00000006 f8c3a2e0 f755bc44 f7712000 00000001 00000000 000000b7 f755bee0 000000b7 f755bdcc f7713f94 00000000 Call Trace: [<f8c3a2e0>] process_cluster_request+0x160/0xd40 [dlm] [<c02b0e98>] inet_recvmsg+0x48/0x70 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8c3e313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] [<c0136af3>] __alloc_pages+0x2f3/0x340 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8c3c011>] receive_from_sock+0x141/0x310 [dlm] [<c0117e67>] recalc_task_prio+0x97/0x190 [<f8c3cec7>] process_sockets+0x57/0x80 [dlm] [<f8c3d13e>] dlm_recvd+0x9e/0xf0 [dlm] [<f8c3d0a0>] dlm_recvd+0x0/0xf0 [dlm] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: 0f 0b 48 02 01 6f c4 f8 c7 04 24 c4 7f c4 f8 e8 3e 35 4e c7
FWIW - I re-ran using flocks: accordion -L flock -s 409600 -e 4096 -m 100 accfile1 accfile2 accfile3 tank-04: name " 3 d7d457c" flags 0 nodeid 6 ref 1 grant queue 000403e8 gr 0 rq -1 flg 8 sts 2 node 6 remid 301a8 lq 0,c name " 4 0" flags 0 nodeid 1 ref 1 grant queue 00010184 gr 3 rq -1 flg 0 sts 2 node 1 remid 102c5 lq 0,0 name " 3 5706894" flags 4 nodeid 0 ref 2 grant queue 00020041 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 0004006e gr 0 rq -1 flg 2008 sts 2 node 1 remid 9027e lq 0,c name " 8 3e8" flags 4 nodeid 0 ref 6 grant queue 000403ac gr 3 rq -1 flg 2008 sts 2 node 4 remid 5006c lq 0,1c 00030117 gr 3 rq -1 flg 2008 sts 2 node 1 remid 202df lq 0,1c 000203be gr 3 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 0001032b gr 3 rq -1 flg 2008 sts 2 node 2 remid 1001a lq 0,1c 000102b2 gr 3 rq -1 flg 2008 sts 2 node 6 remid 10319 lq 0,1c 000600f0 gr 3 rq -1 flg 2008 sts 2 node 5 remid 40113 lq 0,1c name " 3 ad99210" flags 0 nodeid 5 ref 1 grant queue 0005026f gr 5 rq -1 flg 8 sts 2 node 5 remid 50247 lq 0,8 name " 3 d7a45d3" flags 0 nodeid 6 ref 1 grant queue 000200df gr 0 rq -1 flg 8 sts 2 node 6 remid 30122 lq 0,c name " 3 acc9389" flags 0 nodeid 5 ref 1 grant queue 00020058 gr 0 rq -1 flg 8 sts 2 node 5 remid 302ab lq 0,c name " 3 572685a" flags 4 nodeid 0 ref 1 grant queue 0006021a gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,9 name " 1 1" flags 0 nodeid 1 ref 1 grant queue 0001022f gr 3 rq -1 flg 0 sts 2 node 1 remid 1028f lq 0,0 name " 10 2" flags 4 nodeid 0 ref 1 grant queue 00010034 gr 3 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,4c name " 3 2b2b81a" flags 0 nodeid 2 ref 1 grant queue 00010294 gr 0 rq -1 flg 8 sts 2 node 2 remid 200e8 lq 0,c name " 3 acb93a6" flags 0 nodeid 5 ref 1 grant queue 00030146 gr 5 rq -1 flg 8 sts 2 node 5 remid 202d0 lq 0,8 name " 3 d7945f0" flags 0 nodeid 6 ref 1 grant queue 00020265 gr 5 rq -1 flg 8 sts 2 node 6 remid 30342 lq 0,8 name " 2 16" flags 0 nodeid 1 ref 1 grant queue 00010094 gr 3 rq -1 flg 0 sts 2 node 1 remid 10359 lq 0,0 name " 3 5676999" flags 4 nodeid 0 ref 3 grant queue 00020199 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 0005022a gr 5 rq -1 flg 2008 sts 2 node 6 remid 3030e lq 0,8 000401f7 gr 0 rq -1 flg 2008 sts 2 node 1 remid 2004a lq 0,c name " 5 1a" flags 0 nodeid 1 ref 1 grant queue 0001039f gr 3 rq -1 flg 0 sts 2 node 1 remid 102a9 lq 0,0 name " 3 80579" flags 0 nodeid 1 ref 1 grant queue 0002023c gr 5 rq -1 flg 8 sts 2 node 1 remid 40117 lq 0,8 name " 3 56d68eb" flags 4 nodeid 0 ref 3 grant queue 0002007c gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 000501b5 gr 5 rq -1 flg 2008 sts 2 node 2 remid 3011a lq 0,8 000401d0 gr 0 rq -1 flg 2008 sts 2 node 6 remid 90288 lq 0,c name " 3 569695f" flags 4 nodeid 0 ref 4 grant queue 000102f5 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 00010306 gr 5 rq -1 flg 2008 sts 2 node 2 remid 203fc lq 0,8 00010010 gr 0 rq -1 flg 2008 sts 2 node 6 remid 703db lq 0,c 000303b1 gr 0 rq -1 flg 2008 sts 2 node 1 remid 201cc lq 0,c name " 5 7d" flags 0 nodeid 1 ref 1 grant queue 00010378 gr 3 rq -1 flg 0 sts 2 node 1 remid 1009a lq 0,0 name " 3 d854494" flags 0 nodeid 6 ref 1 grant queue 000902e1 gr 5 rq -1 flg 8 sts 2 node 6 remid 80324 lq 0,8 name " 3 5626a2a" flags 4 nodeid 0 ref 3 grant queue 000100cd gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 0001003e gr 5 rq -1 flg 2008 sts 2 node 4 remid 100a0 lq 0,1c 000301bf gr 0 rq -1 flg 2008 sts 2 node 6 remid 102a3 lq 0,c name " 3 821e0e8" flags 0 nodeid 4 ref 1 grant queue 0002030f gr 0 rq -1 flg 8 sts 2 node 4 remid 50358 lq 0,c name " 3 100491" flags 0 nodeid 1 ref 1 grant queue 00050034 gr 0 rq -1 flg 8 sts 2 node 1 remid 8037c lq 0,c name " 3 ac893fd" flags 0 nodeid 5 ref 1 grant queue 0003037b gr 5 rq -1 flg 8 sts 2 node 5 remid 3024d lq 0,8 name " 3 81ae1b3" flags 0 nodeid 4 ref 1 grant queue 0002008a gr 0 rq -1 flg 8 sts 2 node 4 remid 203cc lq 0,c name " 3 605b3" flags 0 nodeid 1 ref 1 grant queue 0002024c gr 5 rq -1 flg 8 sts 2 node 1 remid 200e8 lq 0,8 name " 3 56569d3" flags 4 nodeid 0 ref 4 grant queue 00010125 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 00050305 gr 5 rq -1 flg 2008 sts 2 node 1 remid 701c6 lq 0,8 0002033f gr 0 rq -1 flg 2008 sts 2 node 6 remid 500a2 lq 0,c 000203dc gr 0 rq -1 flg 2008 sts 2 node 2 remid 20157 lq 0,c name " 3 e04cb" flags 0 nodeid 1 ref 1 grant queue 000603c8 gr 5 rq -1 flg 8 sts 2 node 1 remid 500a7 lq 0,8 name " 3 ad392be" flags 0 nodeid 5 ref 1 grant queue 00020173 gr 5 rq -1 flg 8 sts 2 node 5 remid 40224 lq 0,8 name " 3 819e1d0" flags 0 nodeid 4 ref 1 grant queue 00020312 gr 0 rq -1 flg 8 sts 2 node 4 remid 10126 lq 0,c name " 5 16" flags 0 nodeid 1 ref 1 grant queue 000102a7 gr 3 rq -1 flg 0 sts 2 node 1 remid 102b0 lq 0,0 name " 3 ad492a1" flags 0 nodeid 5 ref 1 grant queue 000700cd gr 0 rq -1 flg 8 sts 2 node 5 remid 501ce lq 0,c name " 5 18" flags 0 nodeid 1 ref 1 grant queue 000100a7 gr 3 rq -1 flg 0 sts 2 node 1 remid 20024 lq 0,0 name " 3 817e20a" flags 0 nodeid 4 ref 1 grant queue 00020313 gr 0 rq -1 flg 8 sts 2 node 4 remid 20163 lq 0,c name " 3 56b6925" flags 4 nodeid 0 ref 3 grant queue 000200bd gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 00020164 gr 0 rq -1 flg 2008 sts 2 node 6 remid 702eb lq 0,c 000102fc gr 0 rq -1 flg 2008 sts 2 node 2 remid 2018f lq 0,c name " 3 ada91f3" flags 0 nodeid 5 ref 1 grant queue 000400cf gr 0 rq -1 flg 8 sts 2 node 5 remid 803d8 lq 0,c name " 3 acd936c" flags 0 nodeid 5 ref 1 grant queue 000302e7 gr 0 rq -1 flg 8 sts 2 node 5 remid 20054 lq 0,c name " 3 d7c4599" flags 0 nodeid 6 ref 1 grant queue 0002002f gr 0 rq -1 flg 8 sts 2 node 6 remid 40157 lq 0,c name " 3 3060a" flags 0 nodeid 1 ref 1 grant queue 0002010f gr 5 rq -1 flg 8 sts 2 node 1 remid 2004e lq 0,8 name " 3 825e074" flags 0 nodeid 4 ref 1 grant queue 00040155 gr 5 rq -1 flg 8 sts 2 node 4 remid 5010c lq 0,8 name " 2 7d" flags 0 nodeid 1 ref 1 grant queue 00040004 gr 3 rq -1 flg 0 sts 2 node 1 remid 600bd lq 0,0 name " 3 815e244" flags 0 nodeid 4 ref 1 grant queue 00040050 gr 5 rq -1 flg 8 sts 2 node 4 remid 100d9 lq 0,8 name " 3 2beb6be" flags 0 nodeid 2 ref 1 grant queue 0002012d gr 0 rq -1 flg 8 sts 2 node 2 remid 2015e lq 0,c name " 3 818e1ed" flags 0 nodeid 4 ref 1 grant queue 000100d5 gr 5 rq -1 flg 8 sts 2 node 4 remid 20249 lq 0,8 name " 2 1a" flags 0 nodeid 1 ref 1 grant queue 00010127 gr 3 rq -1 flg 0 sts 2 node 1 remid 10191 lq 0,0 name " 5 d77462f" flags 0 nodeid 6 ref 1 grant queue 0001026f gr 3 rq -1 flg 0 sts 2 node 6 remid 1020c lq 0,1 name " 3 81fe122" flags 0 nodeid 4 ref 1 grant queue 00040011 gr 5 rq -1 flg 8 sts 2 node 4 remid 3035e lq 0,8 name " 3 568697c" flags 4 nodeid 0 ref 3 grant queue 000403b8 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 0001016d gr 5 rq -1 flg 2008 sts 2 node 6 remid 50142 lq 0,8 00020359 gr 0 rq -1 flg 2008 sts 2 node 1 remid 4037e lq 0,c name " 3 5716877" flags 4 nodeid 0 ref 2 grant queue 0001007e gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 00050112 gr 5 rq -1 flg 2008 sts 2 node 6 remid a0143 lq 0,8 name " 5 2b0b859" flags 0 nodeid 6 ref 1 grant queue 00010179 gr 3 rq -1 flg 0 sts 2 node 6 remid 103da lq 0,1 name " 3 56669b6" flags 4 nodeid 0 ref 3 grant queue 00020035 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 00010045 gr 0 rq -1 flg 2008 sts 2 node 2 remid 70257 lq 0,c 0001030e gr 0 rq -1 flg 2008 sts 2 node 6 remid 1019e lq 0,c name " 3 ad8922d" flags 0 nodeid 5 ref 1 grant queue 0005003d gr 5 rq -1 flg 8 sts 2 node 5 remid 3033f lq 0,8 name " 3 2b4b7e0" flags 0 nodeid 2 ref 1 grant queue 00030190 gr 0 rq -1 flg 8 sts 2 node 2 remid 30003 lq 0,c name " 8 3e9" flags 4 nodeid 0 ref 6 grant queue 0005006c gr 3 rq -1 flg 2008 sts 2 node 4 remid 50017 lq 0,1c 00070009 gr 3 rq -1 flg 2008 sts 2 node 1 remid 40380 lq 0,1c 000402fa gr 3 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 000101d7 gr 3 rq -1 flg 2008 sts 2 node 2 remid 10196 lq 0,1c 0001011f gr 3 rq -1 flg 2008 sts 2 node 6 remid 10316 lq 0,1c 000202a3 gr 3 rq -1 flg 2008 sts 2 node 5 remid 503c9 lq 0,1c name " 3 816e227" flags 0 nodeid 4 ref 1 grant queue 000201a7 gr 5 rq -1 flg 8 sts 2 node 4 remid 100a4 lq 0,8 name " 3 826e057" flags 0 nodeid 4 ref 1 grant queue 00080040 gr 5 rq -1 flg 8 sts 2 node 4 remid 603df lq 0,8 name " 3 56a6942" flags 4 nodeid 0 ref 3 grant queue 00020277 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 00020133 gr 5 rq -1 flg 2008 sts 2 node 6 remid 4014d lq 0,1c 000101d1 gr 0 rq -1 flg 2008 sts 2 node 2 remid 303cd lq 0,c name " 3 d864477" flags 0 nodeid 6 ref 1 grant queue 0002005c gr 0 rq -1 flg 8 sts 2 node 6 remid c02be lq 0,c name " 3 405ed" flags 0 nodeid 1 ref 1 grant queue 00010054 gr 5 rq -1 flg 8 sts 2 node 1 remid 10214 lq 0,8 name " 1 2" flags 0 nodeid 1 ref 1 grant queue 00010124 gr 3 rq -1 flg 0 sts 2 node 1 remid 101c3 lq 0,0 name " 3 f04ae" flags 0 nodeid 1 ref 1 grant queue 0004011c gr 5 rq -1 flg 8 sts 2 node 1 remid b02ee lq 0,8 name " 3 56e68ce" flags 4 nodeid 0 ref 3 grant queue 000401e1 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 0006001a gr 5 rq -1 flg 2008 sts 2 node 6 remid 90312 lq 0,1c 00040045 gr 0 rq -1 flg 2008 sts 2 node 2 remid 201ac lq 0,c name " 3 2b3b7fd" flags 0 nodeid 2 ref 1 grant queue 000200f6 gr 0 rq -1 flg 8 sts 2 node 2 remid 300ff lq 0,c name " 4 8145c00" flags 4 nodeid 0 ref 1 grant queue 0001026b gr 5 rq -1 flg 0 sts 2 node 0 remid 0 lq 0,0 name " 3 828e01d" flags 0 nodeid 4 ref 1 grant queue 0002032c gr 0 rq -1 flg 8 sts 2 node 4 remid 202f3 lq 0,c name " 6 7d" flags 0 nodeid 5 ref 2 wait queue 000601db gr -1 rq 5 flg 0 sts 1 node 5 remid 0 lq 3,1 name " 2 19" flags 0 nodeid 1 ref 1 grant queue 000102f9 gr 3 rq -1 flg 0 sts 2 node 1 remid 10038 lq 0,0 name " 5 814dc05" flags 0 nodeid 6 ref 1 grant queue 0001023e gr 3 rq -1 flg 0 sts 2 node 6 remid 10222 lq 0,1 name " 2 17" flags 0 nodeid 1 ref 1 grant queue 00010217 gr 3 rq -1 flg 0 sts 2 node 1 remid 10224 lq 0,0 name " 2 2b0b859" flags 0 nodeid 6 ref 1 grant queue 0001015d gr 3 rq -1 flg 0 sts 2 node 6 remid 20030 lq 0,1 name " 3 ad292db" flags 0 nodeid 5 ref 1 grant queue 000203f6 gr 5 rq -1 flg 8 sts 2 node 5 remid 40080 lq 0,8 name " 3 2bab732" flags 0 nodeid 2 ref 1 grant queue 000402b1 gr 0 rq -1 flg 8 sts 2 node 2 remid 10282 lq 0,c name " 3 d8244eb" flags 0 nodeid 6 ref 1 grant queue 000202bc gr 0 rq -1 flg 8 sts 2 node 6 remid 40281 lq 0,c name " 3 d7b45b6" flags 0 nodeid 6 ref 1 grant queue 00020063 gr 5 rq -1 flg 8 sts 2 node 6 remid 40020 lq 0,8 name " 2 d77462f" flags 0 nodeid 6 ref 1 grant queue 0002021d gr 3 rq -1 flg 0 sts 2 node 6 remid 70333 lq 0,0 name " 3 ac993e0" flags 0 nodeid 5 ref 1 grant queue 0002034b gr 5 rq -1 flg 8 sts 2 node 5 remid 10191 lq 0,8 name " 3 d814508" flags 0 nodeid 6 ref 1 grant queue 000401b5 gr 5 rq -1 flg 8 sts 2 node 6 remid b0129 lq 0,8 name " 3 56f68b1" flags 4 nodeid 0 ref 3 grant queue 000603e2 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 00040204 gr 0 rq -1 flg 2008 sts 2 node 1 remid 902bb lq 0,c 00040275 gr 0 rq -1 flg 2008 sts 2 node 6 remid 200ac lq 0,c name " 3 56469f0" flags 4 nodeid 0 ref 2 grant queue 000101d0 gr 5 rq -1 flg 8 sts 2 node 0 remid 0 lq 0,1c 000103e6 gr 0 rq -1 flg 2008 sts 2 node 2 remid 201c2 lq 0,c name " 3 9055c" flags 0 nodeid 1 ref 1 grant queue 00050189 gr 5 rq -1 flg 8 sts 2 node 1 remid b017c lq 0,8 name " 3 d8444b1" flags 0 nodeid 6 ref 1 grant queue 0002001d gr 5 rq -1 flg 8 sts 2 node 6 remid b02be lq 0,8 name " 3 56c6908" flags 4 nodeid 0 ref 3 grant queue 00020377 gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 000400de gr 5 rq -1 flg 2008 sts 2 node 2 remid 201cb lq 0,8 00030347 gr 0 rq -1 flg 2008 sts 2 node 6 remid 503a7 lq 0,c name " 3 2bdb6db" flags 0 nodeid 2 ref 1 grant queue 000301cd gr 5 rq -1 flg 8 sts 2 node 2 remid 40177 lq 0,8 name " 3 2b9b74f" flags 0 nodeid 2 ref 1 grant queue 00010379 gr 0 rq -1 flg 8 sts 2 node 2 remid 2011e lq 0,c name " 3 d78460d" flags 0 nodeid 6 ref 1 grant queue 00020394 gr 5 rq -1 flg 8 sts 2 node 6 remid 200ff lq 0,8 name " 5 19" flags 0 nodeid 1 ref 1 grant queue 00010146 gr 3 rq -1 flg 0 sts 2 node 1 remid 10250 lq 0,0 name " 2 814dc05" flags 0 nodeid 6 ref 1 grant queue 000300f7 gr 3 rq -1 flg 0 sts 2 node 6 remid 60166 lq 0,0 name " 3 70596" flags 0 nodeid 1 ref 1 grant queue 000100e6 gr 0 rq -1 flg 8 sts 2 node 1 remid 40010 lq 0,c name " 5 17" flags 0 nodeid 1 ref 1 grant queue 00010288 gr 3 rq -1 flg 0 sts 2 node 1 remid 1014b lq 0,0 name " 3 81ee13f" flags 0 nodeid 4 ref 1 grant queue 0002036e gr 5 rq -1 flg 8 sts 2 node 4 remid 20072 lq 0,8 name " 3 5636a0d" flags 4 nodeid 0 ref 3 grant queue 000101ac gr 0 rq 5 flg 8 sts 2 node 0 remid 0 lq 0,1d 00020047 gr 5 rq -1 flg 2008 sts 2 node 2 remid 1009b lq 0,8 00020343 gr 0 rq -1 flg 2008 sts 2 node 4 remid 102d4 lq 0,c name "xkJyBKlyJsdHbujW5QyCE5OELvGiFwiaFKRGYHfbv1W0Wa71NWtV0zzmuAVi6QfQ" flags 0 nodeid 2 ref 1 grant queue 00010268 gr 1 rq -1 flg 0 sts 2 node 2 remid 101fe lq 0,1 al to 5 gfs0 send einval to 5 gfs0 send einval to 5 gfs0 send einval to 5 gfs0 rq 5 40187 " 6 d77462f" gfs0 send lu 40187 to 6 gfs0 lu rep 40187 fr 6 2 gfs0 send rq 40187 to 2 gfs0 un 402d5 ref 1 flg 0 nodeid 1/-1 " 2 7d gfs0 send un 402d5 to 1 gfs0 rq 3 300f7 " 2 814dc05" gfs0 send lu 300f7 to 6 gfs0 lu rep 300f7 fr 6 6 gfs0 send rq 300f7 to 6 gfs0 rq 5 40098 " 6 814dc05" gfs0 send lu 40098 to 6 gfs0 lu rep 40098 fr 6 1 gfs0 send rq 40098 to 1 gfs0 rq 5 30184 " 6 d77462f" gfs0 send lu 30184 to 6 gfs0 lu rep 30184 fr 6 2 gfs0 send rq 30184 to 2 gfs0 rq 5 50105 " 6 d77462f" gfs0 send lu 50105 to 6 gfs0 lu rep 50105 fr 6 2 gfs0 send rq 50105 to 2 gfs0 rq 3 40004 " 2 7d" gfs0 send lu 40004 to 2 gfs0 lu rep 40004 fr 2 1 gfs0 send rq 40004 to 1 gfs0 rq 5 601db " 6 7d" gfs0 send lu 601db to 6 gfs0 lu rep 601db fr 6 5 gfs0 send rq 601db to 5 gfs0 rq 5 from 4 4035e " 6 7d" DLM: Assertion failed on line 584 of file cluster/dlm/locking.c DLM: assertion: "rsb->res_nodeid == 0" DLM: time = 2690348 dlm: lkb id 4035e remid 80294 flags 2000 status 0 rqmode 5 grmode -1 nodeid 4 lqstate 0 lqflags 1 dlm: request rh_cmd 2 rh_lkid 80294 remlkid 0 flags 1 status 0 rqmode 5 nodeid 4 ------------[ cut here ]------------ kernel BUG at cluster/dlm/locking.c:584! invalid operand: 0000 [#1] Modules linked in: gfs lock_dlm dlm cman lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<f8c37b9e>] Not tainted EFLAGS: 00010282 (2.6.7) EIP is at remote_stage2+0x21e/0x240 [dlm] eax: 00000001 ebx: c7976810 ecx: 00000000 edx: f7435e04 esi: 00000005 edi: c23b6000 ebp: f7cf4138 esp: f7435e00 ds: 007b es: 007b ss: 0068 Process dlm_recvd (pid: 2453, threadinfo=f7434000 task=f74390b0) Stack: f8c46f17 00000004 f8c46f01 f8c46f41 00290d2c c79a4a05 00000000 00000004 c79a4990 00000000 f7cf4138 c23b6000 00000004 f8c3a2e0 f745d444 f7434000 00000001 00000000 00000067 f745d6e0 00000067 f745d5cc f7435f94 00000000 Call Trace: [<f8c3a2e0>] process_cluster_request+0x160/0xd40 [dlm] [<c02b0e98>] inet_recvmsg+0x48/0x70 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8c3e313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] [<c0136af3>] __alloc_pages+0x2f3/0x340 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8c3c011>] receive_from_sock+0x141/0x310 [dlm] [<c0117e67>] recalc_task_prio+0x97/0x190 [<f8c3cec7>] process_sockets+0x57/0x80 [dlm] [<f8c3d13e>] dlm_recvd+0x9e/0xf0 [dlm] [<f8c3d0a0>] dlm_recvd+0x0/0xf0 [dlm] [<c010429d>] kernel_thread_helper+0x5/0x18
I reproduced this today and have checked in the fix. A slight modification was needed to the recent change that addressed the data error.
Did you check this into cvs? I just built this morning and hit: GFS <CVS> (built Jul 21 2004 09:56:16) installed CMAN <CVS> (built Jul 21 2004 10:17:04) installed DLM <CVS> (built Jul 21 2004 10:17:17) installed Lock_DLM (built Jul 21 2004 10:27:08) installed tank-04: ------------[ cut here ]------------ kernel BUG at cluster/dlm/locking.c:584! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<f8a70b9e>] Not tainted EFLAGS: 00010282 (2.6.7) EIP is at remote_stage2+0x21e/0x240 [dlm] eax: 00000001 ebx: f509257c ecx: 00000000 edx: f74b5e04 esi: 00000004 edi: f7355000 ebp: f7f77738 esp: f74b5e00 ds: 007b es: 007b ss: 0068 Process dlm_recvd (pid: 3582, threadinfo=f74b4000 task=c23ff330) Stack: f8a7ff17 00000001 f8a7ff01 f8a7ff41 005f4d69 f50e2f39 00000000 00000001 f50e2ec4 00000000 f7f77738 f7355000 00000001 f8a732e0 f76eac44 f74b4000 00000001 00000000 000000b7 f76eaee0 000000b7 f76eadcc f74b5f94 00000000 Call Trace: [<f8a732e0>] process_cluster_request+0x160/0xd40 [dlm] [<c02b0e98>] inet_recvmsg+0x48/0x70 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8a77313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] [<c0136af3>] __alloc_pages+0x2f3/0x340 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<f8a75011>] receive_from_sock+0x141/0x310 [dlm] [<c0117e67>] recalc_task_prio+0x97/0x190 [<f8a75ec7>] process_sockets+0x57/0x80 [dlm] [<f8a7613e>] dlm_recvd+0x9e/0xf0 [dlm] [<f8a760a0>] dlm_recvd+0x0/0xf0 [dlm] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: 0f 0b 48 02 01 ff a7 f8 c7 04 24 c4 0f a8 f8 e8 3e a5 6a c7 <6>CMAN: killed by STARTTRANS or NOMINATE CMAN: we are leaving the cluster SM: send_nodeid_message error -107 to 1 SM: send_nodeid_message error -107 to 4 SM: send_nodeid_message error -107 to 3 SM: send_broadcast_message error -107 SM: send_broadcast_message error -107 0 5,5 id 1f03da sts -65538 un 11,d774630 id 13011c cur 5 0 qc 11,d774630 5,5 id 13011c sts -65538 ex punlock 3751 error 0 en plock 3751 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 1c03ce sts 0 req 7,d774630 ex 0-7fffffffffffffff 3751 w 1 lk 7,d774630 id 0 -1,5 0 un 11,d774630 id 1c03ce cur 5 0 qc 11,d774630 5,5 id 1c03ce sts -65538 lk 2,d774630 id 2c0133 5,3 45 qc 2,d774630 5,3 id 2c0133 sts 0 un 2,d774630 id 2c0133 cur 3 0 qc 2,d774630 3,3 id 2c0133 sts -65538 qc 7,d774630 -1,5 id 1e017d sts 0 ex plock 3751 error 0 lk 2,d774630 id 0 -1,3 0 qc 2,d774630 -1,3 id 1d03ac sts 0 lk 2,d774630 id 1d03ac 3,5 54 qc 2,d774630 3,5 id 1d03ac sts 0 en punlock 3751 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 15008a sts 0 remove 7,d774630 3751 un 7,d774630 id 1e017d cur 5 0 qc 7,d774630 5,5 id 1e017d sts -65538 un 11,d774630 id 15008a cur 5 0 qc 11,d774630 5,5 id 15008a sts -65538 ex punlock 3751 error 0 lk 2,ac59459 id 0 -1,3 0 lk 2,d774630 id 1d03ac 5,3 45 qc 2,d774630 5,3 id 1d03ac sts 0 qc 2,ac59459 -1,3 id 28024c sts 0 en plock 3751 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 1b0366 sts 0 req 7,ac59459 ex 0-7fffffffffffffff 3751 w 1 lk 7,ac59459 id 0 -1,5 0 un 11,ac59459 id 1b0366 cur 5 0 qc 11,ac59459 5,5 id 1b0366 sts -65538 qc 7,ac59459 -1,5 id 230026 sts 0 ex plock 3751 error 0 lk 2,ac59459 id 28024c 3,5 54 qc 2,ac59459 3,5 id 28024c sts 0 en punlock 3751 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 1e0078 sts 0 remove 7,ac59459 3751 un 7,ac59459 id 230026 cur 5 0 qc 7,ac59459 5,5 id 230026 sts -65538 un 11,ac59459 id 1e0078 cur 5 0 qc 11,ac59459 5,5 id 1e0078 sts -65538 ex punlock 3751 error 0 en plock 3751 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 2f02a1 sts 0 req 7,ac59459 ex 0-7fffffffffffffff 3751 w 1 lk 7,ac59459 id 0 -1,5 0 un 11,ac59459 id 2f02a1 cur 5 0 qc 11,ac59459 5,5 id 2f02a1 sts -65538 qc 7,ac59459 -1,5 id 2001e9 sts 0 ex plock 3751 error 0 en punlock 3751 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 1a01a0 sts 0 remove 7,ac59459 3751 un 7,ac59459 id 2001e9 cur 5 0 qc 7,ac59459 5,5 id 2001e9 sts -65538 un 11,ac59459 id 1a01a0 cur 5 0 qc 11,ac59459 5,5 id 1a01a0 sts -65538 ex punlock 3751 error 0 en plock 3751 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 140110 sts 0 req 7,d774630 ex 0-7fffffffffffffff 3751 w 1 lk 7,d774630 id 0 -1,5 0 un 11,d774630 id 140110 cur 5 0 qc 11,d774630 5,5 id 140110 sts -65538 un 2,d774630 id 1d03ac cur 3 0 qc 2,d774630 3,3 id 1d03ac sts -65538 lk 2,ac59459 id 28024c 5,3 45 qc 2,ac59459 5,3 id 28024c sts 0 un 2,ac59459 id 28024c cur 3 0 qc 2,ac59459 3,3 id 28024c sts -65538 qc 7,d774630 -1,5 id 210023 sts 0 ex plock 3751 error 0 lk 2,d774630 id 0 -1,3 0 qc 2,d774630 -1,3 id 1a035d sts 0 lk 2,d774630 id 1a035d 3,5 54 qc 2,d774630 3,5 id 1a035d sts 0 lk 2,d774630 id 1a035d 5,3 45 qc 2,d774630 5,3 id 1a035d sts 0 en punlock 3751 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 24004a sts 0 remove 7,d774630 3751 un 7,d774630 id 210023 cur 5 0 qc 7,d774630 5,5 id 210023 sts -65538 un 11,d774630 id 24004a cur 5 0 qc 11,d774630 5,5 id 24004a sts -65538 ex punlock 3751 error 0 lk 2,ac59459 id 0 -1,3 0 qc 2,ac59459 -1,3 id 180353 sts 0 en plock 3751 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 22024f sts 0 req 7,ac59459 ex 0-7fffffffffffffff 3751 w 1 lk 7,ac59459 id 0 -1,5 0 un 11,ac59459 id 22024f cur 5 0 qc 11,ac59459 5,5 id 22024f sts -65538 qc 7,ac59459 -1,5 id 200385 sts 0 ex plock 3751 error 0 lk 2,ac59459 id 180353 3,5 54 qc 2,ac59459 3,5 id 180353 sts 0 en punlock 3751 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 1b00cb sts 0 remove 7,ac59459 3751 un 7,ac59459 id 200385 cur 5 0 qc 7,ac59459 5,5 id 200385 sts -65538 un 11,ac59459 id 1b00cb cur 5 0 qc 11,ac59459 5,5 id 1b00cb sts -65538 ex punlock 3751 error 0 en plock 3751 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 1a00de sts 0 req 7,d774630 ex 0-7fffffffffffffff 3751 w 1 lk 7,d774630 id 0 -1,5 0 un 11,d774630 id 1a00de cur 5 0 qc 11,d774630 5,5 id 1a00de sts -65538 lk 8,3e8 id 170285 3,5 5c lock_dlm: Assertion failed on line 388 of file /usr/src/cluster/gfs-kernel/src/dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 6251402 gfs0: num=8,3e8 err=-22 cur=3 req=5 lkf=5c Kernel panic: lock_dlm: Record message above and reboot. tank-06: ------------[ cut here ]------------ kernel BUG at cluster/dlm/locking.c:584! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<e02f6b9e>] Not tainted EFLAGS: 00010282 (2.6.7) EIP is at remote_stage2+0x21e/0x240 [dlm] eax: 00000001 ebx: d80081e0 ecx: 00000000 edx: da37be04 esi: 00000004 edi: dd3dd000 ebp: daf06d38 esp: da37be00 ds: 007b es: 007b ss: 0068 Process dlm_recvd (pid: 3597, threadinfo=da37a000 task=da33f3b0) Stack: e0305f17 00000006 e0305f01 e0305f41 006204c7 d80198dd 00000000 00000006 d8019868 00000000 daf06d38 dd3dd000 00000006 e02f92e0 da46fc44 da37a000 00000001 00000000 00000067 da46fee0 00000067 da46fdcc da37bf94 00000000 Call Trace: [<e02f92e0>] process_cluster_request+0x160/0xd40 [dlm] [<c02b0e98>] inet_recvmsg+0x48/0x70 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<e02fd313>] midcomms_process_incoming_buffer+0x173/0x250 [dlm] [<e00bba68>] scsi_softirq+0xa8/0xd0 [scsi_mod] [<c0136af3>] __alloc_pages+0x2f3/0x340 [<c026bd0c>] sock_recvmsg+0xbc/0xc0 [<e02fb011>] receive_from_sock+0x141/0x310 [dlm] [<c0117e67>] recalc_task_prio+0x97/0x190 [<e02fbec7>] process_sockets+0x57/0x80 [dlm] [<e02fc13e>] dlm_recvd+0x9e/0xf0 [dlm] [<e02fc0a0>] dlm_recvd+0x0/0xf0 [dlm] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: 0f 0b 48 02 01 5f 30 e0 c7 04 24 c4 6f 30 e0 e8 3e 45 e2 df <6>CMAN: killed by STARTTRANS or NOMINATE CMAN: we are leaving the cluster SM: send_nodeid_message error -107 to 4 SM: send_nodeid_message error -107 to 3 SM: send_nodeid_message error -107 to 6 SM: send_broadcast_message error -107 SM: send_broadcast_message error -107 SM: send_broadcast_message error -107 id 3c0188 sts 0 req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 lk 7,d77462f id 0 -1,5 0 un 11,d77462f id 3c0188 cur 5 0 qc 7,d77462f -1,5 id 4000a2 sts 0 qc 11,d77462f 5,5 id 3c0188 sts -65538 ex plock 3768 error 0 lk 2,d77462f id 4b0396 3,5 54 qc 2,d77462f 3,5 id 4b0396 sts 0 en punlock 3768 7,d77462f lk 11,d77462f id 0 -1,5 0 qc 11,d77462f -1,5 id 4d0395 sts 0 remove 7,d77462f 3768 un 7,d77462f id 4000a2 cur 5 0 qc 7,d77462f 5,5 id 4000a2 sts -65538 un 11,d77462f id 4d0395 cur 5 0 qc 11,d77462f 5,5 id 4d0395 sts -65538 ex punlock 3768 error 0 lk 2,ac59459 id 0 -1,3 0 qc 2,ac59459 -1,3 id 3a0133 sts 0 un 2,ac59459 id 3a0133 cur 3 0 qc 2,ac59459 3,3 id 3a0133 sts -65538 lk 2,ac59459 id 0 -1,3 0 qc 2,ac59459 -1,3 id 3f002a sts 0 en plock 3768 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 3c027c sts 0 req 7,ac59459 ex 0-7fffffffffffffff 3768 w 1 lk 7,ac59459 id 0 -1,5 0 un 11,ac59459 id 3c027c cur 5 0 qc 7,ac59459 -1,5 id 51012d sts 0 qc 11,ac59459 5,5 id 3c027c sts -65538 ex plock 3768 error 0 lk 2,ac59459 id 3f002a 3,5 54 qc 2,ac59459 3,5 id 3f002a sts 0 lk 2,d77462f id 4b0396 5,3 45 qc 2,d77462f 5,3 id 4b0396 sts 0 en punlock 3768 7,ac59459 lk 11,ac59459 id 0 -1,5 0 qc 11,ac59459 -1,5 id 4a01e3 sts 0 remove 7,ac59459 3768 un 7,ac59459 id 51012d cur 5 0 qc 7,ac59459 5,5 id 51012d sts -65538 un 11,ac59459 id 4a01e3 cur 5 0 qc 11,ac59459 5,5 id 4a01e3 sts -65538 ex punlock 3768 error 0 en plock 3768 7,d77462f lk 11,d77462f id 0 -1,5 0 qc 11,d77462f -1,5 id 4803b5 sts 0 req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 lk 7,d77462f id 0 -1,5 0 un 11,d77462f id 4803b5 cur 5 0 qc 7,d77462f -1,5 id 3f00a5 sts 0 qc 11,d77462f 5,5 id 4803b5 sts -65538 ex plock 3768 error 0 lk 2,d77462f id 4b0396 3,5 54 qc 2,d77462f 3,5 id 4b0396 sts 0 en punlock 3768 7,d77462f lk 11,d77462f id 0 -1,5 0 lk 2,d77462f id 4b0396 5,3 45 qc 2,d77462f 5,3 id 4b0396 sts 0 qc 11,d77462f -1,5 id 4a035d sts 0 remove 7,d77462f 3768 un 7,d77462f id 3f00a5 cur 5 0 qc 7,d77462f 5,5 id 3f00a5 sts -65538 un 11,d77462f id 4a035d cur 5 0 qc 11,d77462f 5,5 id 4a035d sts -65538 ex punlock 3768 error 0 en plock 3768 7,d77462f lk 11,d77462f id 0 -1,5 0 qc 11,d77462f -1,5 id 390166 sts 0 req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 lk 7,d77462f id 0 -1,5 0 un 11,d77462f id 390166 cur 5 0 qc 11,d77462f 5,5 id 390166 sts -65538 un 2,d77462f id 4b0396 cur 3 0 qc 2,d77462f 3,3 id 4b0396 sts -65538 qc 7,d77462f -1,5 id 3f0165 sts 0 ex plock 3768 error 0 lk 2,d77462f id 0 -1,3 0 qc 2,d77462f -1,3 id 460156 sts 0 lk 2,d77462f id 460156 3,5 54 qc 2,d77462f 3,5 id 460156 sts 0 lk 2,ac59459 id 3f002a 5,3 45 qc 2,ac59459 5,3 id 3f002a sts 0 un 2,ac59459 id 3f002a cur 3 0 qc 2,ac59459 3,3 id 3f002a sts -65538 lk 2,d77462f id 460156 5,3 45 qc 2,d77462f 5,3 id 460156 sts 0 en punlock 3768 7,d77462f lk 11,d77462f id 0 -1,5 0 qc 11,d77462f -1,5 id 490292 sts 0 remove 7,d77462f 3768 un 7,d77462f id 3f0165 cur 5 0 qc 7,d77462f 5,5 id 3f0165 sts -65538 un 11,d77462f id 490292 cur 5 0 qc 11,d77462f 5,5 id 490292 sts -65538 ex punlock 3768 error 0 lk 2,d774630 id 0 -1,3 0 un 2,d77462f id 460156 cur 3 0 qc 2,d77462f 3,3 id 460156 sts -65538 qc 2,d774630 -1,3 id 370067 sts 0 en plock 3768 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 47023d sts 0 req 7,d774630 ex 0-7fffffffffffffff 3768 w 1 lk 7,d774630 id 0 -1,5 0 un 11,d774630 id 47023d cur 5 0 qc 7,d774630 -1,5 id 3e031c sts 0 qc 11,d774630 5,5 id 47023d sts -65538 ex plock 3768 error 0 lk 2,d774630 id 370067 3,5 54 qc 2,d774630 3,5 id 370067 sts 0 en punlock 3768 7,d774630 lk 11,d774630 id 0 -1,5 0 qc 11,d774630 -1,5 id 4903a0 sts 0 remove 7,d774630 3768 un 7,d774630 id 3e031c cur 5 0 qc 7,d774630 5,5 id 3e031c sts -65538 un 11,d774630 id 4903a0 cur 5 0 qc 11,d774630 5,5 id 4903a0 sts -65538 ex punlock 3768 error 0 lk 2,d77462f id 0 -1,3 0 qc 2,d77462f -1,3 id 4b0227 sts 0 en plock 3768 7,d77462f lk 11,d77462f id 0 -1,5 0 qc 11,d77462f -1,5 id 48008c sts 0 req 7,d77462f ex 0-7fffffffffffffff 3768 w 1 lk 7,d77462f id 0 -1,5 0 un 11,d77462f id 48008c cur 5 0 qc 11,d77462f 5,5 id 48008c sts -65538 lk 8,3e8 id 4803ac 3,5 5c lock_dlm: Assertion failed on line 388 of file /usr/src/cluster/gfs-kernel/src/dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 6429423 gfs0: num=8,3e8 err=-22 cur=3 req=5 lkf=5c Kernel panic: lock_dlm: Record message above and reboot.
Scratch the above -- I had a build error on the cluster tree... The modules were installed with the new timestamp, but not rebuilt correctly. Will re-verify once I get the build error figured out...
I ran overnight without hitting any assertions...