Bug 436542
Summary: | Node fails to rejoin cluster after a power reset | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Afom T. Michael <tmichael> | ||||
Component: | cman | Assignee: | Christine Caulfield <ccaulfie> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | GFS Bugs <gfs-bugs> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.1 | CC: | cluster-maint, edamato | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-02-16 14:12:40 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Comment 1
Christine Caulfield
2008-03-10 08:29:20 UTC
By 'failed' I mean after power cycle/reset, the node doesn't rejoin the cluster. After 'ipmitool power reset', sometimes the node just stays down and not power up. In other cases, cman service doesn't start. In the latter situation, here is what I see: [root@ora3 ~]# service cman status groupd is stopped [root@ora3 ~]# service qdiskd status qdiskd (pid 3842) is running... [root@ora3 ~]# service openais status aisexec is stopped [root@ora3 ~]# service ipmi status ipmi_msghandler module loaded. ipmi_si module loaded. ipmi_devintf module loaded. /dev/ipmi0 exists. And in log, there is a repeated messages of "ccsd[3817]: Unable to connect to cluster infrastructure after XXX seconds." On the other nodes of the cluster: [root@ora1 ~]# cman_tool nodes Node Sts Inc Joined Name 0 M 0 2008-03-11 12:32:36 /dev/sdn1 1 M 4600 2008-03-11 12:30:28 ora1 2 M 4648 2008-03-11 13:44:08 ora2 3 X 4636 ora3 4 M 4628 2008-03-11 12:32:11 ora4 [root@ora1 ~]# cman_tool status Version: 6.0.1 Config Version: 31 Cluster Name: ora64xzq Cluster Id: 26725 Cluster Member: Yes Cluster Generation: 4652 Membership state: Cluster-Member Nodes: 3 Expected votes: 4 Total votes: 6 Quorum: 4 Active subsystems: 8 Flags: Ports Bound: 0 11 Node name: ora1 Node ID: 1 Multicast addresses: 225.0.0.12 Node addresses: 192.168.33.87 Created attachment 297659 [details]
cluster.conf
(In reply to comment #2) > By 'failed' I mean after power cycle/reset, the node doesn't rejoin the cluster. > After 'ipmitool power reset', sometimes the node just stays down and not power > up. In other cases, cman service doesn't start. In the latter situation, here is > what I see: > [root@ora3 ~]# service cman status > groupd is stopped > [root@ora3 ~]# service qdiskd status > qdiskd (pid 3842) is running... > [root@ora3 ~]# service openais status > aisexec is stopped > [root@ora3 ~]# service ipmi status > ipmi_msghandler module loaded. > ipmi_si module loaded. > ipmi_devintf module loaded. > /dev/ipmi0 exists. > And in log, there is a repeated messages of "ccsd[3817]: Unable to connect to > cluster infrastructure after XXX seconds." If the node is rebooted at this point, it rejoins as expected. > > On the other nodes of the cluster: > [root@ora1 ~]# cman_tool nodes > Node Sts Inc Joined Name > 0 M 0 2008-03-11 12:32:36 /dev/sdn1 > 1 M 4600 2008-03-11 12:30:28 ora1 > 2 M 4648 2008-03-11 13:44:08 ora2 > 3 X 4636 ora3 > 4 M 4628 2008-03-11 12:32:11 ora4 > [root@ora1 ~]# cman_tool status > Version: 6.0.1 > Config Version: 31 > Cluster Name: ora64xzq > Cluster Id: 26725 > Cluster Member: Yes > Cluster Generation: 4652 > Membership state: Cluster-Member > Nodes: 3 > Expected votes: 4 > Total votes: 6 > Quorum: 4 > Active subsystems: 8 > Flags: > Ports Bound: 0 11 > Node name: ora1 > Node ID: 1 > Multicast addresses: 225.0.0.12 > Node addresses: 192.168.33.87 Are there any relevant messages in syslog on the non-joining node? On other nodes? Does it work if you start cman manually afterwards ? |