Bug 128420 - nodes "not responding" during cluster formation
nodes "not responding" during cluster formation
Status: CLOSED DUPLICATE of bug 139738
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-22 14:39 EDT by Corey Marthaler
Modified: 2009-04-16 15:58 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 14:04:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-07-22 14:39:42 EDT
Description of problem: 
This happened on morph-03 after cmand and fenced membership had been 
obtained and when waiting for clvmd service on all nodes. It appears 
as though morph-03 had lost connection with morph-02, and had 
removed it from the CMAN cluster: 
 
Jul 22 13:11:58 morph-03 kernel: CMAN: got node 
morph-06.lab.msp.redhat.com 
Jul 22 13:11:58 morph-03 kernel: CMAN: got node 
morph-05.lab.msp.redhat.com 
Jul 22 13:12:03 morph-03 kernel: CMAN: got node 
morph-01.lab.msp.redhat.com 
Jul 22 13:12:03 morph-03 kernel: CMAN: quorum regained, resuming 
activity 
Jul 22 13:12:03 morph-03 kernel: CMAN: got node 
morph-04.lab.msp.redhat.com 
Jul 22 13:12:03 morph-03 kernel: CMAN: got node 
morph-02.lab.msp.redhat.com 
Jul 22 13:12:07 morph-03 kernel: CMAN: node 
morph-02.lab.msp.redhat.com is not responding - removi 
ng from the cluster 
Jul 22 13:12:10 morph-03 sshd(pam_unix)[3752]: session opened for 
user root by (uid=0) 
Jul 22 13:12:10 morph-03 sshd(pam_unix)[3752]: session closed for 
user root 
Jul 22 13:12:11 morph-03 kernel: CMAN: node 
morph-02.lab.msp.redhat.com is not responding - removi 
ng from the cluster 
Jul 22 13:12:15 morph-03 kernel: CMAN: node 
morph-02.lab.msp.redhat.com is not responding - removi 
ng from the cluster 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:16 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:16 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:20 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:20 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:20 morph-03 sshd(pam_unix)[3765]: session opened for 
user root by (uid=0) 
Jul 22 13:12:20 morph-03 sshd(pam_unix)[3765]: session closed for 
user root 
Jul 22 13:12:21 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:21 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:21 morph-03 sshd(pam_unix)[3777]: session opened for 
user root by (uid=0) 
Jul 22 13:12:21 morph-03 sshd(pam_unix)[3777]: session closed for 
user root 
Jul 22 13:12:22 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:22 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4 
Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=4294967295 
Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 
nodeid=1 
Jul 22 13:12:33 morph-03 kernel: Got ENDTRANS from a node not the 
master: master: 1, sender: -1 
Jul 22 13:12:34 morph-03 kernel: 
Jul 22 13:12:34 morph-03 kernel: SM:  Assertion failed on line 51 of 
file /usr/src/cluster/cman-kernel/src/sm_misc.c 
Jul 22 13:12:34 morph-03 kernel: SM:  assertion:  "!error" 
Jul 22 13:12:34 morph-03 kernel: SM:  time = 208965 
Jul 22 13:12:34 morph-03 kernel: 
Jul 22 13:12:34 morph-03 kernel: Kernel panic: SM:  Record message 
above and reboot.
Comment 1 David Teigland 2004-08-19 00:50:43 EDT
cnxman errors are what led to this, although sm should be able to
die cleanly (without a panic) when things break down
Comment 2 Kiersten (Kerri) Anderson 2004-11-04 10:14:40 EST
Updates with the proper version and component name.
Comment 3 David Teigland 2005-01-04 22:17:30 EST
I think these problems have been fixed, at least for the most part.
Comment 4 Christine Caulfield 2005-01-05 09:34:00 EST

*** This bug has been marked as a duplicate of 139738 ***
Comment 5 Red Hat Bugzilla 2006-02-21 14:04:38 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.