Description of problem: This happened on morph-03 after cmand and fenced membership had been obtained and when waiting for clvmd service on all nodes. It appears as though morph-03 had lost connection with morph-02, and had removed it from the CMAN cluster: Jul 22 13:11:58 morph-03 kernel: CMAN: got node morph-06.lab.msp.redhat.com Jul 22 13:11:58 morph-03 kernel: CMAN: got node morph-05.lab.msp.redhat.com Jul 22 13:12:03 morph-03 kernel: CMAN: got node morph-01.lab.msp.redhat.com Jul 22 13:12:03 morph-03 kernel: CMAN: quorum regained, resuming activity Jul 22 13:12:03 morph-03 kernel: CMAN: got node morph-04.lab.msp.redhat.com Jul 22 13:12:03 morph-03 kernel: CMAN: got node morph-02.lab.msp.redhat.com Jul 22 13:12:07 morph-03 kernel: CMAN: node morph-02.lab.msp.redhat.com is not responding - removi ng from the cluster Jul 22 13:12:10 morph-03 sshd(pam_unix)[3752]: session opened for user root by (uid=0) Jul 22 13:12:10 morph-03 sshd(pam_unix)[3752]: session closed for user root Jul 22 13:12:11 morph-03 kernel: CMAN: node morph-02.lab.msp.redhat.com is not responding - removi ng from the cluster Jul 22 13:12:15 morph-03 kernel: CMAN: node morph-02.lab.msp.redhat.com is not responding - removi ng from the cluster Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:15 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:16 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:16 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:18 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:19 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:20 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:20 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:20 morph-03 sshd(pam_unix)[3765]: session opened for user root by (uid=0) Jul 22 13:12:20 morph-03 sshd(pam_unix)[3765]: session closed for user root Jul 22 13:12:21 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:21 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:21 morph-03 sshd(pam_unix)[3777]: session opened for user root by (uid=0) Jul 22 13:12:21 morph-03 sshd(pam_unix)[3777]: session closed for user root Jul 22 13:12:22 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:22 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:24 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:26 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:28 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:30 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4 Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 nodeid=4294967295 Jul 22 13:12:32 morph-03 kernel: SM: process_reply invalid id=1 nodeid=1 Jul 22 13:12:33 morph-03 kernel: Got ENDTRANS from a node not the master: master: 1, sender: -1 Jul 22 13:12:34 morph-03 kernel: Jul 22 13:12:34 morph-03 kernel: SM: Assertion failed on line 51 of file /usr/src/cluster/cman-kernel/src/sm_misc.c Jul 22 13:12:34 morph-03 kernel: SM: assertion: "!error" Jul 22 13:12:34 morph-03 kernel: SM: time = 208965 Jul 22 13:12:34 morph-03 kernel: Jul 22 13:12:34 morph-03 kernel: Kernel panic: SM: Record message above and reboot.
cnxman errors are what led to this, although sm should be able to die cleanly (without a panic) when things break down
Updates with the proper version and component name.
I think these problems have been fixed, at least for the most part.
*** This bug has been marked as a duplicate of 139738 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.