Bug 142984 - CMAN: Error queueing request to port 1: -12 -- kernel panic
Summary: CMAN: Error queueing request to port 1: -12 -- kernel panic
Keywords:
Status: CLOSED DUPLICATE of bug 133512
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-12-15 16:21 UTC by Adam "mantis" Manthei
Modified: 2009-04-16 19:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-02-21 19:07:42 UTC
Embargoed:


Attachments (Terms of Use)
cman init script that will produce this bug (2.22 KB, text/plain)
2004-12-15 16:26 UTC, Adam "mantis" Manthei
no flags Details

Description Adam "mantis" Manthei 2004-12-15 16:21:44 UTC
Description of problem:
While running "cman_tool join" on my 8 nodes at the same time, I ended up
panicking the kernel with the following message:

    <6>CMAN: Error queueing request to port 1: -12
    Kernel panic - not syncing: membership stopped responding

The node that panicked was the only one that had been able to start up.  I'm not
sure what was happening on the other nodes (all I know is that they were not
listed in /proc/cluster/nodes as my script paused until they were registered). 
Once the node panicked, the other 7 nodes were able to join the cluster.

Version-Release number of selected component (if applicable):
kernel-2.6.9-1.906_EL
cman-kernel-2.6.9-3.3
ccs-0.9-0
cman-1.0-0.pre5.0

How reproducible:
I've seen it a few times, but always when it also produced bug #142853.  I
bumped /proc/cluster/config/cman/transition_restarts to 500 and was able to see
this panick w/out the BUG() in bug #142853.

Steps to Reproduce:
( same process as bug #142853 )
1. run "cman_tool join" on 8 nodes simultaneosly
2. spin chamber
3. pull trigger

  
Actual results:
panic

Expected results:
no panic

Additional info:

Comment 1 Adam "mantis" Manthei 2004-12-15 16:26:37 UTC
Created attachment 108628 [details]
cman init script that will produce this bug

this is the cman init script that will produce this bug (not, there is also
another script that starts ccs). 

copy this to /etc/rc.d/init.d then run "chkconfig cman on".  Repeat for all
nodes in the cluster, then reboot them all that same time.  For me, my nodes
are all pretty much the same, so they all take about the same time to boot.

Comment 2 Christine Caulfield 2004-12-20 15:45:49 UTC
I suspect this is the same bug as #142853 & #133512

Comment 3 Adam "mantis" Manthei 2005-01-11 01:11:51 UTC
not sure if this is related or if it is its own bug but I got a kernel
oops joining 9 nodes at the same time which was latter followed by the
panic listed above

------------[ cut here ]------------
kernel BUG at
/usr/src/sources/cluster-RHEL4/cman-kernel/src/membership.c:260!
invalid operand: 0000 [#1]
Modules linked in: cman(U) sunrpc md5 ipv6 dm_mod button battery ac
uhci_hcd ehci_hcd e1000 floppy ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e02a5d28>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-1.906_EL)
EIP is at set_nodeid+0x25b/0x365 [cman]
eax: 00000014   ebx: e02bc300   ecx: e02b06fe   edx: dd048ecc
esi: e02bc64c   edi: dd72b9c8   ebp: 00000007   esp: dd048ec8
ds: 007b   es: 007b   ss: 0068
Process cman_memb (pid: 8294, threadinfo=dd048000 task=dc1dc2b0)
Stack: e02b06fe dd72b9c0 e02b06e9 dd7d8fe0 e02b06c6 00000007 000dd596
dc312c00
       e02bc300 e02bc64c dd72b9c8 dc312c00 e02a75ed 00000012 01000001
e02bc644
       e02bc620 e02bc644 00000014 e02bc634 e02a7777 00000007 00000001
0000000e
Call Trace:
 [<e02a75ed>] add_new_node+0x199/0x23b [cman]
 [<e02a7777>] add_node_from_starttrans+0x30/0x97 [cman]
 [<e02a7e3e>] do_process_starttrans+0x16d/0x24d [cman]
 [<e02a87ff>] do_process_hello+0x6b/0x112 [cman]
 [<e02a6963>] do_membership_packet+0x134/0x1c0 [cman]
 [<e02a8b95>] dispatch_messages+0x85/0xb2 [cman]
 [<e02a5768>] membership_kthread+0x3f8/0x58b [cman]
 [<c011b9fa>] default_wake_function+0x0/0xc
 [<e02a5370>] membership_kthread+0x0/0x58b [cman]
 [<c01041d9>] kernel_thread_helper+0x5/0xb
Code: e8 43 a3 e7 df a1 5c d2 2b e0 8b 04 a8 ff 70 08 68 e9 06 2b e0
e8 2e a3 e7 df 8b 44 24 14 ff 70 08 68 fe 06 2b e0 e8 1d a3 e7 df <0f>
0b 04 01 85 05 2b e0 83 c4 18 81 3d 40 d2 2b e0 3c 4b 24 1d

CMAN: Error queueing request to port 1: -12
Kernel panic - not syncing: membership stopped responding

Comment 4 Christine Caulfield 2005-01-11 08:53:20 UTC
That's 133512, and I think this one is too.

Comment 5 Christine Caulfield 2005-01-11 15:55:33 UTC

*** This bug has been marked as a duplicate of 133512 ***

Comment 6 Red Hat Bugzilla 2006-02-21 19:07:42 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.


Note You need to log in before you can comment on or make changes to this bug.