Bug 139958 - shot all nodes but one, and it ends up Oops'ing
shot all nodes but one, and it ends up Oops'ing
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-18 17:15 EST by Corey Marthaler
Modified: 2009-04-16 15:59 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-01-18 18:05:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-11-18 17:15:43 EST
Description of problem:
6 node cluster, all die but morph-02.

CMAN: node morph-05 is not responding - removing from the cluster
CMAN: node morph-04 is not responding - removing from the cluster
------------[ cut here ]------------
kernel BUG at /usr/src/cluster/cman-kernel/src/membership.c:670!
invalid operand: 0000 [#1]
SMP
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode
uhci_hcd ehci_hcd button battery ac ext3 jbd dm_mod qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f8a64360>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9)
EIP is at send_joinconf+0x10/0x80 [cman]
eax: 00000000   ebx: 00000006   ecx: c201d200   edx: 00000246
esi: f8a79904   edi: ffffffff   ebp: 00000000   esp: f6b8df14
ds: 007b   es: 007b   ss: 0068
Process cman_memb (pid: 3698, threadinfo=f6b8c000 task=f745d7b0)
Stack: 00384c73 00000006 f8a79904 ffffffff 00000006 f8a65e7c f8a7a080
f6b8df54
       c011d467 f6ce2670 00000000 f740f880 c012aa54 00000000 00000246
00387383
       f8a797c8 f8a79904 ffffffff f6ff9f80 f8a6502e 00000000 00000000
00220000
Call Trace:
 [<f8a65e7c>] do_process_startack+0x12c/0x3d0 [cman]
 [<c011d467>] recalc_task_prio+0x97/0x190
 [<c012aa54>] __mod_timer+0xf4/0x130
 [<f8a6502e>] start_transition+0x1ee/0x2a0 [cman]
 [<f8a65204>] a_node_just_died+0x124/0x190 [cman]
 [<f8a63afc>] membership_kthread+0x40c/0x450 [cman]
 [<c0105ec2>] ret_from_fork+0x6/0x14
 [<c011f2d0>] default_wake_function+0x0/0x10
 [<f8a636f0>] membership_kthread+0x0/0x450 [cman]
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 04 24 60 04 a7 f8 e8 50 ec 6b c7 eb ee 8d b4 26 00 00 00 00 8d
bc 27 00 00 00 00 83 ec 14 a1 40 a7 a7 f8 89 5c 24 10 85 c0 75 08 <0f>
0b 9e 02 a0 04 a7 f8 89 44 24 0c ba 02 00 00 00 b9 00 00 01
 dlm: connecting to 6
dlm: connecting to 5
dlm: connecting to 5
dlm: connecting to 6
dlm: connecting to 3
dlm: connecting to 3
dlm: connecting to 4
dlm: connecting to 4
dlm: connecting to 4

Version-Release number of selected component (if applicable):
CMAN <CVS> (built Nov 17 2004 16:54:33) installed

How reproducible:
Didn't try
Comment 1 Christine Caulfield 2004-11-19 06:43:52 EST
joining_nodeid wasn't being cleared in one corner case.

Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.34; previous revision: 1.33
done
Comment 2 Corey Marthaler 2005-01-18 18:05:01 EST
fix verified.

Note You need to log in before you can comment on or make changes to this bug.