Description of problem: While running "cman_tool join" on my 8 nodes at the same time, I ended up panicking the kernel with the following message: <6>CMAN: Error queueing request to port 1: -12 Kernel panic - not syncing: membership stopped responding The node that panicked was the only one that had been able to start up. I'm not sure what was happening on the other nodes (all I know is that they were not listed in /proc/cluster/nodes as my script paused until they were registered). Once the node panicked, the other 7 nodes were able to join the cluster. Version-Release number of selected component (if applicable): kernel-2.6.9-1.906_EL cman-kernel-2.6.9-3.3 ccs-0.9-0 cman-1.0-0.pre5.0 How reproducible: I've seen it a few times, but always when it also produced bug #142853. I bumped /proc/cluster/config/cman/transition_restarts to 500 and was able to see this panick w/out the BUG() in bug #142853. Steps to Reproduce: ( same process as bug #142853 ) 1. run "cman_tool join" on 8 nodes simultaneosly 2. spin chamber 3. pull trigger Actual results: panic Expected results: no panic Additional info:
Created attachment 108628 [details] cman init script that will produce this bug this is the cman init script that will produce this bug (not, there is also another script that starts ccs). copy this to /etc/rc.d/init.d then run "chkconfig cman on". Repeat for all nodes in the cluster, then reboot them all that same time. For me, my nodes are all pretty much the same, so they all take about the same time to boot.
I suspect this is the same bug as #142853 & #133512
not sure if this is related or if it is its own bug but I got a kernel oops joining 9 nodes at the same time which was latter followed by the panic listed above ------------[ cut here ]------------ kernel BUG at /usr/src/sources/cluster-RHEL4/cman-kernel/src/membership.c:260! invalid operand: 0000 [#1] Modules linked in: cman(U) sunrpc md5 ipv6 dm_mod button battery ac uhci_hcd ehci_hcd e1000 floppy ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<e02a5d28>] Not tainted VLI EFLAGS: 00010246 (2.6.9-1.906_EL) EIP is at set_nodeid+0x25b/0x365 [cman] eax: 00000014 ebx: e02bc300 ecx: e02b06fe edx: dd048ecc esi: e02bc64c edi: dd72b9c8 ebp: 00000007 esp: dd048ec8 ds: 007b es: 007b ss: 0068 Process cman_memb (pid: 8294, threadinfo=dd048000 task=dc1dc2b0) Stack: e02b06fe dd72b9c0 e02b06e9 dd7d8fe0 e02b06c6 00000007 000dd596 dc312c00 e02bc300 e02bc64c dd72b9c8 dc312c00 e02a75ed 00000012 01000001 e02bc644 e02bc620 e02bc644 00000014 e02bc634 e02a7777 00000007 00000001 0000000e Call Trace: [<e02a75ed>] add_new_node+0x199/0x23b [cman] [<e02a7777>] add_node_from_starttrans+0x30/0x97 [cman] [<e02a7e3e>] do_process_starttrans+0x16d/0x24d [cman] [<e02a87ff>] do_process_hello+0x6b/0x112 [cman] [<e02a6963>] do_membership_packet+0x134/0x1c0 [cman] [<e02a8b95>] dispatch_messages+0x85/0xb2 [cman] [<e02a5768>] membership_kthread+0x3f8/0x58b [cman] [<c011b9fa>] default_wake_function+0x0/0xc [<e02a5370>] membership_kthread+0x0/0x58b [cman] [<c01041d9>] kernel_thread_helper+0x5/0xb Code: e8 43 a3 e7 df a1 5c d2 2b e0 8b 04 a8 ff 70 08 68 e9 06 2b e0 e8 2e a3 e7 df 8b 44 24 14 ff 70 08 68 fe 06 2b e0 e8 1d a3 e7 df <0f> 0b 04 01 85 05 2b e0 83 c4 18 81 3d 40 d2 2b e0 3c 4b 24 1d CMAN: Error queueing request to port 1: -12 Kernel panic - not syncing: membership stopped responding
That's 133512, and I think this one is too.
*** This bug has been marked as a duplicate of 133512 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.