Bug 126526 - cman panics in membership.c when attempting to gain membership
cman panics in membership.c when attempting to gain membership
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Derek Anderson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-22 16:54 EDT by Corey Marthaler
Modified: 2010-01-11 21:52 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-08-24 14:18:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-06-22 16:54:41 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.1; Linux)

Description of problem:
I've seen this panic a few times when attempting to get a cluster up
doing a cman_tool join

CMAN V2.0.1 (built Jun 17 2004 11:57:54) installed
DLM (built Jun 17 2004 11:58:07) installed
Lock_DLM (built Jun 17 2004 11:37:29) installed
CMAN: quorum regained, resuming activity
------------[ cut here ]------------
kernel BUG at cluster/cman/membership.c:611!
invalid operand: 0000 [#1]
Modules linked in: lock_dlm dlm cman lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e02a0980>]    Not tainted
EFLAGS: 00010246   (2.6.7)
EIP is at send_joinconf+0x10/0x70 [cman]
eax: 00000000   ebx: e02b4fa0   ecx: c03b59e8   edx: e02b5bd4
esi: db831f70   edi: 00000004   ebp: ffffffff   esp: db831f10
ds: 007b   es: 007b   ss: 0068
Process cman_memb (pid: 2279, threadinfo=db830000 task=dc1f59b0)
Stack: 00000246 e02b4fa0 db831f70 00000004 e02b4fa0 e02a2646 e02b4fa0 e02b4fa0
       db831f78 00000002 e02a125e c0117e67 00000002 c0000000 db831f70 db831f78
       00000000 e02a3ada 00000040 db830000 00000000 dc0a1200 e02b4fa0 000005da
Call Trace:
 [<e02a2646>] do_process_viewack+0x126/0x220 [cman]
 [<e02a125e>] do_membership_packet+0x12e/0x1c0 [cman]
 [<c0117e67>] recalc_task_prio+0x97/0x190
 [<e02a3ada>] dispatch_messages+0xda/0x100 [cman]
 [<e02a01ef>] membership_kthread+0x18f/0x3e0 [cman]
 [<c0105c12>] ret_from_fork+0x6/0x14
 [<c0118850>] default_wake_function+0x0/0x10
 [<e02a0060>] membership_kthread+0x0/0x3e0 [cman]
 [<c010429d>] kernel_thread_helper+0x5/0x18

Code: 0f 0b 63 02 e5 d9 2a e0 89 44 24 0c ba 02 00 00 00 b9 00 00



Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1. ccsd
2. cman_tool join

    

Additional info:
Comment 1 Christine Caulfield 2004-06-23 02:46:04 EDT
Can you give a little more information? A copy of cluster.xml and a
count of how many nodes are in the cluster (at the time, and
potentially) would be useful.

Comment 2 Corey Marthaler 2004-06-23 11:08:15 EDT
I'll try to add more info next time I see this panic. 
 
Here is the cluster.xml file I used. This has also been seen on 
another cluster as well, both having 6 nodes.  
 
<?xml version="1.0"?> 
<cluster name="morph-cluster" config_version="1"> 
 
<cman> 
</cman> 
 
<dlm> 
</dlm> 
 
<nodes> 
        <node name="morph-01" votes="1"> 
                <fcdriver>qla2300</fcdriver> 
                <fence> 
                        <method name="single"> 
                                <device name="apc" port="1:1"/> 
                        </method> 
                </fence> 
        </node> 
        <node name="morph-02" votes="1"> 
                <fcdriver>qla2300</fcdriver> 
                <fence> 
                        <method name="single"> 
                                <device name="apc" port="1:2"/> 
                        </method> 
                </fence> 
        </node> 
        <node name="morph-03" votes="1"> 
                <fcdriver>qla2300</fcdriver> 
                <fence> 
                        <method name="single"> 
                                <device name="apc" port="1:3"/> 
                        </method> 
                </fence> 
        </node> 
        <node name="morph-04" votes="1"> 
                <fcdriver>qla2300</fcdriver> 
                <fence> 
                        <method name="single"> 
                                <device name="apc" port="1:4"/> 
                        </method> 
                </fence> 
        </node> 
        <node name="morph-05" votes="1"> 
                <fcdriver>qla2300</fcdriver> 
                <fence> 
                        <method name="single"> 
                                <device name="apc" port="1:5"/> 
                        </method> 
                </fence> 
        </node> 
        <node name="morph-06" votes="1"> 
                <fcdriver>qla2300</fcdriver> 
                <fence> 
                        <method name="single"> 
                                <device name="apc" port="1:6"/> 
                        </method> 
                </fence> 
        </node> 
 
</nodes> 
 
 
<fence_devices> 
        <device name="apc" agent="fence_apc" ipaddr="morph-apc" 
login="apc" passwd="apc"/> 
</fence_devices> 
 
 
<rm> 
</rm> 
 
</cluster> 
 
Comment 3 Christine Caulfield 2004-06-24 06:17:39 EDT
I've checked into BitKeeper what I hope is a fix. cset is 1.1683
Comment 4 Dean Jansa 2004-07-14 17:12:22 EDT
I hit this again today, built from a cluster cvs tree I checked out 
July 13.  6 node cluster as above. 
 
tank-01: 
kernel BUG at /usr/src/cluster/cman-kernel/src/membership.c:611! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harnesd 
CPU:    0 
EIP:    0060:[<f8a589d0>]    Not tainted 
EFLAGS: 00010246   (2.6.7) 
EIP is at send_joinconf+0x10/0x70 [cman] 
eax: 00000000   ebx: 00000002   ecx: c03b5ab0   edx: f8a6f3d4 
esi: ffffffff   edi: 00000000   ebp: 00000000   esp: f54a5e94 
ds: 007b   es: 007b   ss: 0068 
Process cman_memb (pid: 4345, threadinfo=f54a4000 task=f54b7230) 
Stack: 00000246 00000002 ffffffff 00000000 00000002 f8a5a3c5 
f584bf58 00000000 
       c03150d8 f70ef894 c0118897 00000000 f584bf64 f8a6e51c 
ffffffff 00000000 
       f70ef894 f8a5960c f54a5ef4 00000286 00000000 00000000 
f8a6e51c f8a7ae0d 
Call Trace: 
 [<f8a5a3c5>] do_process_startack+0x125/0x300 [cman] 
 [<c0118897>] __wake_up_common+0x37/0x70 
 [<f8a5960c>] start_transition+0x1ec/0x290 [cman] 
 [<f8a7ae0d>] cman_callback+0x1d/0x20 [dlm] 
 [<f8a5981e>] a_node_just_died+0x16e/0x190 [cman] 
 [<f8a5b681>] do_process_leave+0x61/0x90 [cman] 
 [<f8a5921c>] do_membership_packet+0x9c/0x1c0 [cman] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8a5bb6a>] dispatch_messages+0xda/0x100 [cman] 
 [<f8a5823f>] membership_kthread+0x18f/0x3e0 [cman] 
 [<c0105c12>] ret_from_fork+0x6/0x14 
 [<c0118850>] default_wake_function+0x0/0x10 
 [<f8a580b0>] membership_kthread+0x0/0x3e0 [cman] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 63 02 64 45 a6 f8 89 44 24 0c ba 02 00 00 00 b9 00 00 
 
tank-03: 
------------[ cut here ]------------ 
kernel BUG at /usr/src/cluster/cman-kernel/src/membership.c:611! 
invalid operand: 0000 [#1] 
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs 
lock_harnesd 
CPU:    0 
EIP:    0060:[<f8a589d0>]    Not tainted 
EFLAGS: 00010246   (2.6.7) 
EIP is at send_joinconf+0x10/0x70 [cman] 
eax: 00000000   ebx: 00000002   ecx: c03b5a68   edx: f8a6f3d4 
esi: ffffffff   edi: 00000000   ebp: 00000000   esp: c2233e94 
ds: 007b   es: 007b   ss: 0068 
Process cman_memb (pid: 4344, threadinfo=c2232000 task=c229f1b0) 
Stack: 00000246 00000002 ffffffff 00000000 00000002 f8a5a3c5 
f6045f58 00000000 
       c03150d8 c22fb494 c0118897 00000000 f6045f64 f8a6e51c 
ffffffff 00000000 
       c22fb494 f8a5960c c2233ef4 00000286 00000000 00000000 
f8a6e51c f8a7ae0d 
Call Trace: 
 [<f8a5a3c5>] do_process_startack+0x125/0x300 [cman] 
 [<c0118897>] __wake_up_common+0x37/0x70 
 [<f8a5960c>] start_transition+0x1ec/0x290 [cman] 
 [<f8a7ae0d>] cman_callback+0x1d/0x20 [dlm] 
 [<f8a5981e>] a_node_just_died+0x16e/0x190 [cman] 
 [<f8a5b681>] do_process_leave+0x61/0x90 [cman] 
 [<f8a5921c>] do_membership_packet+0x9c/0x1c0 [cman] 
 [<c0117e67>] recalc_task_prio+0x97/0x190 
 [<f8a5bb6a>] dispatch_messages+0xda/0x100 [cman] 
 [<f8a5823f>] membership_kthread+0x18f/0x3e0 [cman] 
 [<c0105c12>] ret_from_fork+0x6/0x14 
 [<c0118850>] default_wake_function+0x0/0x10 
 [<f8a580b0>] membership_kthread+0x0/0x3e0 [cman] 
 [<c010429d>] kernel_thread_helper+0x5/0x18 
 
Code: 0f 0b 63 02 64 45 a6 f8 89 44 24 0c ba 02 00 00 00 b9 00 00 
 
 
tank-04: 
 
CMAN: too many transition restarts - will die 
 
tank-06: 
CMAN: too many transition restarts - will die 
 
 
Comment 5 Christine Caulfield 2004-07-20 04:03:39 EDT
Thanks for the excellent reports on this. It should be fixed now.

The underlying cause may still exist but that's covered by #126991

Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.5; previous revision: 1.4
done
Comment 6 Dean Jansa 2004-08-24 14:18:06 EDT
Seems good now...  I didn't hit this after a few hours of ripping 
down and rebuilding a cluster. 
Comment 7 Kiersten (Kerri) Anderson 2004-11-16 14:07:00 EST
Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.