Bug 128420 - nodes "not responding" during cluster formation
Summary: nodes "not responding" during cluster formation
Status: CLOSED DUPLICATE of bug 139738
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cman   
(Show other bugs)
Version: 4
Hardware: i686 Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-07-22 18:39 UTC by Corey Marthaler
Modified: 2009-04-16 19:58 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 19:04:38 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Corey Marthaler 2004-07-22 18:39:42 UTC
Description of problem: 
This happened on morph-03 after cmand and fenced membership had been 
obtained and when waiting for clvmd service on all nodes. It appears 
as though morph-03 had lost connection with morph-02, and had 
removed it from the CMAN cluster: 
 
Jul 22 13:11:58 morph-03 kernel: CMAN: got node 
morph-06.lab.msp.redhat.com 
Jul 22 13:11:58 morph-03 kernel: CMAN: got node 
morph-05.lab.msp.redhat.com 
Jul 22 13:12:03 morph-03 kernel: CMAN: got node 
morph-01.lab.msp.redhat.com 
Jul 22 13:12:03 morph-03 kernel: CMAN: quorum regained, resuming 
activity 
Jul 22 13:12:03 morph-03 kernel: CMAN: got node 
morph-04.lab.msp.redhat.com 
Jul 22 13:12:03 morph-03 kernel: CMAN: got node 
morph-02.lab.msp.redhat.com 
Jul 22 13:12:07 morph-03 kernel: CMAN: node 
morph-02.lab.msp.redhat.com is not responding - removi 
ng from the cluster 
Jul 22 13:12:10 morph-03 sshd(pam_unix)[3752]: session opened for 
user root by (uid=0) 
Jul 22 13:12:10 morph-03 sshd(pam_unix)[3752]: session closed for 
user root 
Jul 22 13:12:11 morph-03 kernel: CMAN: node 
morph-02.lab.msp.redhat.com is not responding - removi 
ng from the cluster 
Jul 22 13:12:15 morph-03 kernel: CMAN: node 
morph-02.lab.msp.redhat.com is not responding - removi 
ng from the cluster 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:16 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:16 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:20 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:20 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:20 morph-03 sshd(pam_unix)[3765]: session opened for 
user root by (uid=0) 
Jul 22 13:12:20 morph-03 sshd(pam_unix)[3765]: session closed for 
user root 
Jul 22 13:12:21 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:21 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:21 morph-03 sshd(pam_unix)[3777]: session opened for 
user root by (uid=0) 
Jul 22 13:12:21 morph-03 sshd(pam_unix)[3777]: session closed for 
user root 
Jul 22 13:12:22 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:22 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:33 morph-03 kernel: Got ENDTRANS from a node not the 
master: master: 1, sender: -1 
Jul 22 13:12:34 morph-03 kernel: 
Jul 22 13:12:34 morph-03 kernel: SM:  Assertion failed on line 51 of 
file /usr/src/cluster/cman-kernel/src/sm_misc.c 
Jul 22 13:12:34 morph-03 kernel: SM:  assertion:  "!error" 
Jul 22 13:12:34 morph-03 kernel: SM:  time = 208965 
Jul 22 13:12:34 morph-03 kernel: 
Jul 22 13:12:34 morph-03 kernel: Kernel panic: SM:  Record message 
above and reboot.

Comment 1 David Teigland 2004-08-19 04:50:43 UTC
cnxman errors are what led to this, although sm should be able to
die cleanly (without a panic) when things break down

Comment 2 Kiersten (Kerri) Anderson 2004-11-04 15:14:40 UTC
Updates with the proper version and component name.

Comment 3 David Teigland 2005-01-05 03:17:30 UTC
I think these problems have been fixed, at least for the most part.

Comment 4 Christine Caulfield 2005-01-05 14:34:00 UTC

*** This bug has been marked as a duplicate of 139738 ***

Comment 5 Red Hat Bugzilla 2006-02-21 19:04:38 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.


Note You need to log in before you can comment on or make changes to this bug.