142984 – CMAN: Error queueing request to port 1: -12 -- kernel panic

Bug 142984 - CMAN: Error queueing request to port 1: -12 -- kernel panic

Summary: CMAN: Error queueing request to port 1: -12 -- kernel panic

Keywords:
Status:	CLOSED DUPLICATE of bug 133512
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	cman
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Christine Caulfield
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-12-15 16:21 UTC by Adam "mantis" Manthei
Modified:	2009-04-16 19:59 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-02-21 19:07:42 UTC
Embargoed:

Attachments	(Terms of Use)
cman init script that will produce this bug (2.22 KB, text/plain) 2004-12-15 16:26 UTC, Adam "mantis" Manthei	no flags	Details
View All

Description Adam "mantis" Manthei 2004-12-15 16:21:44 UTC

Description of problem:
While running "cman_tool join" on my 8 nodes at the same time, I ended up
panicking the kernel with the following message:

    <6>CMAN: Error queueing request to port 1: -12
    Kernel panic - not syncing: membership stopped responding

The node that panicked was the only one that had been able to start up.  I'm not
sure what was happening on the other nodes (all I know is that they were not
listed in /proc/cluster/nodes as my script paused until they were registered). 
Once the node panicked, the other 7 nodes were able to join the cluster.

Version-Release number of selected component (if applicable):
kernel-2.6.9-1.906_EL
cman-kernel-2.6.9-3.3
ccs-0.9-0
cman-1.0-0.pre5.0

How reproducible:
I've seen it a few times, but always when it also produced bug #142853.  I
bumped /proc/cluster/config/cman/transition_restarts to 500 and was able to see
this panick w/out the BUG() in bug #142853.

Steps to Reproduce:
( same process as bug #142853 )
1. run "cman_tool join" on 8 nodes simultaneosly
2. spin chamber
3. pull trigger

  
Actual results:
panic

Expected results:
no panic

Additional info:

Comment 1 Adam "mantis" Manthei 2004-12-15 16:26:37 UTC

Created attachment 108628 [details]
cman init script that will produce this bug

this is the cman init script that will produce this bug (not, there is also
another script that starts ccs). 

copy this to /etc/rc.d/init.d then run "chkconfig cman on".  Repeat for all
nodes in the cluster, then reboot them all that same time.  For me, my nodes
are all pretty much the same, so they all take about the same time to boot.

Comment 2 Christine Caulfield 2004-12-20 15:45:49 UTC

I suspect this is the same bug as #142853 & #133512

Comment 3 Adam "mantis" Manthei 2005-01-11 01:11:51 UTC

not sure if this is related or if it is its own bug but I got a kernel
oops joining 9 nodes at the same time which was latter followed by the
panic listed above

------------[ cut here ]------------
kernel BUG at
/usr/src/sources/cluster-RHEL4/cman-kernel/src/membership.c:260!
invalid operand: 0000 [#1]
Modules linked in: cman(U) sunrpc md5 ipv6 dm_mod button battery ac
uhci_hcd ehci_hcd e1000 floppy ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e02a5d28>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-1.906_EL)
EIP is at set_nodeid+0x25b/0x365 [cman]
eax: 00000014   ebx: e02bc300   ecx: e02b06fe   edx: dd048ecc
esi: e02bc64c   edi: dd72b9c8   ebp: 00000007   esp: dd048ec8
ds: 007b   es: 007b   ss: 0068
Process cman_memb (pid: 8294, threadinfo=dd048000 task=dc1dc2b0)
Stack: e02b06fe dd72b9c0 e02b06e9 dd7d8fe0 e02b06c6 00000007 000dd596
dc312c00
       e02bc300 e02bc64c dd72b9c8 dc312c00 e02a75ed 00000012 01000001
e02bc644
       e02bc620 e02bc644 00000014 e02bc634 e02a7777 00000007 00000001
0000000e
Call Trace:
 [<e02a75ed>] add_new_node+0x199/0x23b [cman]
 [<e02a7777>] add_node_from_starttrans+0x30/0x97 [cman]
 [<e02a7e3e>] do_process_starttrans+0x16d/0x24d [cman]
 [<e02a87ff>] do_process_hello+0x6b/0x112 [cman]
 [<e02a6963>] do_membership_packet+0x134/0x1c0 [cman]
 [<e02a8b95>] dispatch_messages+0x85/0xb2 [cman]
 [<e02a5768>] membership_kthread+0x3f8/0x58b [cman]
 [<c011b9fa>] default_wake_function+0x0/0xc
 [<e02a5370>] membership_kthread+0x0/0x58b [cman]
 [<c01041d9>] kernel_thread_helper+0x5/0xb
Code: e8 43 a3 e7 df a1 5c d2 2b e0 8b 04 a8 ff 70 08 68 e9 06 2b e0
e8 2e a3 e7 df 8b 44 24 14 ff 70 08 68 fe 06 2b e0 e8 1d a3 e7 df <0f>
0b 04 01 85 05 2b e0 83 c4 18 81 3d 40 d2 2b e0 3c 4b 24 1d

CMAN: Error queueing request to port 1: -12
Kernel panic - not syncing: membership stopped responding

Comment 4 Christine Caulfield 2005-01-11 08:53:20 UTC

That's 133512, and I think this one is too.

Comment 5 Christine Caulfield 2005-01-11 15:55:33 UTC


*** This bug has been marked as a duplicate of 133512 ***

Comment 6 Red Hat Bugzilla 2006-02-21 19:07:42 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.