Bug 143538 - Ooops while joining cluster (in cman_comms)
Summary: Ooops while joining cluster (in cman_comms)
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman
Version: 4
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-12-21 23:54 UTC by Corey Marthaler
Modified: 2009-04-16 19:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-03-14 17:52:53 UTC
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2004-12-21 23:54:32 UTC
Description of problem:
I was continuely joining and leaving cluster membership on all nodes
in a 6 node cluster. I eventually hit this Ooops on morph-06 after
starting ccsd again and then running the cman_tool join on all nodes.

Dec 21 17:43:18 morph-06 ccsd[4215]: Starting ccsd DEVEL.1103653322:
Dec 21 17:43:18 morph-06 sshd(pam_unix)[4205]: session closed for user
root
Unable to handle kernel NULL pointer dereference at virtual address
00000159
 printing eip:
e029ce44
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: cman ipv6 parport_pc lp parport autofs4 sunrpc
e1000 microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd
qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e029ce44>]    Not tainted VLI
EFLAGS: 00010282   (2.6.9)
EIP is at __sendmsg+0xb4/0x680 [cman]
eax: 00000001   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: da6f6380   edi: daa05f98   ebp: daa05f60   esp: daa05e84
ds: 007b   es: 007b   ss: 0068
Process cman_comms (pid: 4227, threadinfo=daa04000 task=da620430)
Stack: 00000001 daa05eac c011f419 00000001 00000000 00000001 00000001
db2ccb70
       db2ccb70 00000000 00000001 00000000 00000000 daa05e90 daa05ea4
00000000
       00200000 01000000 daa05f74 daa05f7c 00000000 00000000 c011d467
00000001
Call Trace:
 [<c011f419>] __wake_up_sync+0x49/0x70
 [<c011d467>] recalc_task_prio+0x97/0x190
 [<c014cfbb>] zap_pmd_range+0x4b/0x70
 [<c0120ae0>] autoremove_wake_function+0x0/0x50
 [<c028f4c6>] kernel_recvmsg+0x36/0x50
 [<e029a150>] receive_message+0x70/0xe0 [cman]
 [<e029d79b>] send_queued_message+0x9b/0xa0 [cman]
 [<e029a3be>] cluster_kthread+0x1fe/0x300 [cman]
 [<c011f2d0>] default_wake_function+0x0/0x10
 [<e029a1c0>] cluster_kthread+0x0/0x300 [cman]
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: ff ff ff 8b 40 04 89 85 50 ff ff ff 8b 95 70 ff ff ff 8b 52 04
85 d2 0f 85 9a 05 00 00 c6 85 57 ff ff ff 00 85 f6 74 10 8b 46 14 <0f>
b6 80 58 01 00 00 88 85 57 ff ff ff ba 6b 00 00 00 b8 fb f0
 Dec 21 17:43:25 morph-06 sshd(pam_unix)[4217]: session opened for
user root by (uid=0)
Dec 21 17:43:26 morph-06 kernel: CMAN: Waiting to join or form a
Linux-cluster


Version-Release number of selected component (if applicable):
cman_tool DEVEL.1103653244 (built Dec 21 2004 12:21:57)

How reproducible:
Didn't try

Comment 1 Christine Caulfield 2005-01-10 11:41:34 UTC
The queued_messages list was not being cleaned at shutdown, so it's
possible that cman might try to send a queued messaged from a previous
incarnation. with hilarious results.

Checking in cnxman.c;
/cvs/cluster/cluster/cman-kernel/src/cnxman.c,v  <--  cnxman.c
new revision: 1.44; previous revision: 1.43
done

Checking in cnxman.c;
/cvs/cluster/cluster/cman-kernel/src/cnxman.c,v  <--  cnxman.c
new revision: 1.42.2.1; previous revision: 1.42
done

Comment 2 Corey Marthaler 2005-03-14 17:52:53 UTC
fix verified.


Note You need to log in before you can comment on or make changes to this bug.