Bug 126526
Summary: | cman panics in membership.c when attempting to gain membership | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | gfs | Assignee: | Christine Caulfield <ccaulfie> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Derek Anderson <danderso> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | djansa |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-08-24 18:18:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2004-06-22 20:54:41 UTC
Can you give a little more information? A copy of cluster.xml and a count of how many nodes are in the cluster (at the time, and potentially) would be useful. I'll try to add more info next time I see this panic. Here is the cluster.xml file I used. This has also been seen on another cluster as well, both having 6 nodes. <?xml version="1.0"?> <cluster name="morph-cluster" config_version="1"> <cman> </cman> <dlm> </dlm> <nodes> <node name="morph-01" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" port="1:1"/> </method> </fence> </node> <node name="morph-02" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" port="1:2"/> </method> </fence> </node> <node name="morph-03" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" port="1:3"/> </method> </fence> </node> <node name="morph-04" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" port="1:4"/> </method> </fence> </node> <node name="morph-05" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" port="1:5"/> </method> </fence> </node> <node name="morph-06" votes="1"> <fcdriver>qla2300</fcdriver> <fence> <method name="single"> <device name="apc" port="1:6"/> </method> </fence> </node> </nodes> <fence_devices> <device name="apc" agent="fence_apc" ipaddr="morph-apc" login="apc" passwd="apc"/> </fence_devices> <rm> </rm> </cluster> I've checked into BitKeeper what I hope is a fix. cset is 1.1683 I hit this again today, built from a cluster cvs tree I checked out July 13. 6 node cluster as above. tank-01: kernel BUG at /usr/src/cluster/cman-kernel/src/membership.c:611! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harnesd CPU: 0 EIP: 0060:[<f8a589d0>] Not tainted EFLAGS: 00010246 (2.6.7) EIP is at send_joinconf+0x10/0x70 [cman] eax: 00000000 ebx: 00000002 ecx: c03b5ab0 edx: f8a6f3d4 esi: ffffffff edi: 00000000 ebp: 00000000 esp: f54a5e94 ds: 007b es: 007b ss: 0068 Process cman_memb (pid: 4345, threadinfo=f54a4000 task=f54b7230) Stack: 00000246 00000002 ffffffff 00000000 00000002 f8a5a3c5 f584bf58 00000000 c03150d8 f70ef894 c0118897 00000000 f584bf64 f8a6e51c ffffffff 00000000 f70ef894 f8a5960c f54a5ef4 00000286 00000000 00000000 f8a6e51c f8a7ae0d Call Trace: [<f8a5a3c5>] do_process_startack+0x125/0x300 [cman] [<c0118897>] __wake_up_common+0x37/0x70 [<f8a5960c>] start_transition+0x1ec/0x290 [cman] [<f8a7ae0d>] cman_callback+0x1d/0x20 [dlm] [<f8a5981e>] a_node_just_died+0x16e/0x190 [cman] [<f8a5b681>] do_process_leave+0x61/0x90 [cman] [<f8a5921c>] do_membership_packet+0x9c/0x1c0 [cman] [<c0117e67>] recalc_task_prio+0x97/0x190 [<f8a5bb6a>] dispatch_messages+0xda/0x100 [cman] [<f8a5823f>] membership_kthread+0x18f/0x3e0 [cman] [<c0105c12>] ret_from_fork+0x6/0x14 [<c0118850>] default_wake_function+0x0/0x10 [<f8a580b0>] membership_kthread+0x0/0x3e0 [cman] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: 0f 0b 63 02 64 45 a6 f8 89 44 24 0c ba 02 00 00 00 b9 00 00 tank-03: ------------[ cut here ]------------ kernel BUG at /usr/src/cluster/cman-kernel/src/membership.c:611! invalid operand: 0000 [#1] Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harnesd CPU: 0 EIP: 0060:[<f8a589d0>] Not tainted EFLAGS: 00010246 (2.6.7) EIP is at send_joinconf+0x10/0x70 [cman] eax: 00000000 ebx: 00000002 ecx: c03b5a68 edx: f8a6f3d4 esi: ffffffff edi: 00000000 ebp: 00000000 esp: c2233e94 ds: 007b es: 007b ss: 0068 Process cman_memb (pid: 4344, threadinfo=c2232000 task=c229f1b0) Stack: 00000246 00000002 ffffffff 00000000 00000002 f8a5a3c5 f6045f58 00000000 c03150d8 c22fb494 c0118897 00000000 f6045f64 f8a6e51c ffffffff 00000000 c22fb494 f8a5960c c2233ef4 00000286 00000000 00000000 f8a6e51c f8a7ae0d Call Trace: [<f8a5a3c5>] do_process_startack+0x125/0x300 [cman] [<c0118897>] __wake_up_common+0x37/0x70 [<f8a5960c>] start_transition+0x1ec/0x290 [cman] [<f8a7ae0d>] cman_callback+0x1d/0x20 [dlm] [<f8a5981e>] a_node_just_died+0x16e/0x190 [cman] [<f8a5b681>] do_process_leave+0x61/0x90 [cman] [<f8a5921c>] do_membership_packet+0x9c/0x1c0 [cman] [<c0117e67>] recalc_task_prio+0x97/0x190 [<f8a5bb6a>] dispatch_messages+0xda/0x100 [cman] [<f8a5823f>] membership_kthread+0x18f/0x3e0 [cman] [<c0105c12>] ret_from_fork+0x6/0x14 [<c0118850>] default_wake_function+0x0/0x10 [<f8a580b0>] membership_kthread+0x0/0x3e0 [cman] [<c010429d>] kernel_thread_helper+0x5/0x18 Code: 0f 0b 63 02 64 45 a6 f8 89 44 24 0c ba 02 00 00 00 b9 00 00 tank-04: CMAN: too many transition restarts - will die tank-06: CMAN: too many transition restarts - will die Thanks for the excellent reports on this. It should be fixed now. The underlying cause may still exist but that's covered by #126991 Checking in membership.c; /cvs/cluster/cluster/cman-kernel/src/membership.c,v <-- membership.c new revision: 1.5; previous revision: 1.4 done Seems good now... I didn't hit this after a few hours of ripping down and rebuilding a cluster. Updating version to the right level in the defects. Sorry for the storm. |