| Summary: | corosync cfg stops working after one membership change (master) | ||
|---|---|---|---|
| Product: | [Retired] Corosync Cluster Engine | Reporter: | Fabio Massimo Di Nitto <fdinitto> |
| Component: | unknown | Assignee: | Angus Salkeld <asalkeld> |
| Status: | CLOSED UPSTREAM | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 1.4 | CC: | asalkeld, jfriesse, sdake |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-01-11 07:36:35 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Fabio, this should be fixed now. |
This is a rather easy one to reproduce. 2 nodes, running pure corosync (no cman or anything else) corosync.conf has usual stuff for interface and debugging on, plus: quorum { provider: corosync_quorum_ykd expected_votes: 2 votes: 1 quorumdev_poll: 0 leaving_timeout: 2 disallowed: 0 quorate: 1 two_node: 1 } Take corosync-quorumtool from topic-quorum-fabbione (commit eceaf9ac0695e72d3115e7f844aa59d33b3f9129). Use against corosync flatiron-1.4 (expected and working behaviour) [root@fedora-master-node1 tools]# ./corosync-quorumtool -m Version: 1.8.0pre.331-1304-dirty Nodes: 2 Ring ID: 240 Quorum type: corosync_quorum_ykd Quorate: Yes starting monitoring loop date: Mon Dec 12 12:15:25 2011 Nodes: 2 Ring ID: 240 Quorate: Yes Nodeid Name 3238176960 fedora-master-node1.int.fabbione.net 3254954176 fedora-master-node2.int.fabbione.net date: Mon Dec 12 12:15:28 2011 Nodes: 1 Ring ID: 244 Quorate: No Nodeid Name 3238176960 fedora-master-node1.int.fabbione.net date: Mon Dec 12 12:15:28 2011 Nodes: 1 Ring ID: 244 Quorate: Yes Nodeid Name 3238176960 fedora-master-node1.int.fabbione.net date: Mon Dec 12 12:15:36 2011 Nodes: 2 Ring ID: 248 Quorate: No Nodeid Name 3238176960 fedora-master-node1.int.fabbione.net 3254954176 fedora-master-node2.int.fabbione.net ^^^^ the node names are resolved via *node_name function in corosync-quorumtool.c that calls err = corosync_cfg_get_node_addrs(c_handle, nodeid, INTERFACE_MAX, &numaddrs, addrs); on each membership change basically. Running the same tool against master branch or topic-quorum-fabbione: ----------------------- [root@fedora-master-node1 tools]# ./corosync-quorumtool -m Version: 1.8.0pre.331-1304-dirty Nodes: 2 Ring ID: 256 Quorum type: corosync_quorum_ykd Quorate: Yes starting monitoring loop date: Mon Dec 12 12:17:40 2011 Nodes: 2 Ring ID: 256 Quorate: Yes Nodeid Name 3238176960 fedora-master-node1.int.fabbione.net 3254954176 fedora-master-node2.int.fabbione.net date: Mon Dec 12 12:17:41 2011 Nodes: 1 Ring ID: 260 Quorate: No Nodeid Name Unable to get node address for nodeid 3238176960: 6 3238176960 date: Mon Dec 12 12:17:41 2011 Nodes: 1 Ring ID: 260 Quorate: Yes Nodeid Name Unable to get node address for nodeid 3238176960: 6 3238176960 it appears that the same call is sending back a TRYAGAIN that doesn´t look correct to me at all...