Bug 620679
Summary: | qdiskd should stop voting if no <quorumd config is available | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Fabio Massimo Di Nitto <fdinitto> | ||||
Component: | cluster | Assignee: | Lon Hohberger <lhh> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | bbrock, ccaulfie, cluster-maint, jkortus, lhh, rpeterso, ssaha, teigland | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cluster-3.0.12-27.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 615926 | Environment: | |||||
Last Closed: | 2011-05-19 12:53:27 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 615926 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Comment 1
Fabio Massimo Di Nitto
2010-08-03 09:16:33 UTC
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** This requires careful consideration. The reconfiguration order is important. Consider the case of a 4 node cluster + qdiskd. Expected votes is 7 If you remove qdiskd from cluster.conf, cman will recalc expected votes to 4. Then, before qdiskd processes the config change, it calls cman_poll_quorum_device(). This bumps expected votes back to 7. Then qdiskd exits. For now, it's much safer to: (a) ensure all nodes are in the cluster (b) kill qdiskd with SIGTERM on all nodes (c) remove qdiskd from cluster.conf so I understand all the issues described above. In the specific case (a) and (c) are already true. qdiskd is already gone from cluster.conf and all nodes are in the cluster and active. we don't have a way to tell qdiskd to die. Doesn't cman_poll_quorum_device() recalculate every time based on qdiskd votes? if so, votes from qdiskd would go down to 0 (no config?no votes ;)) and expected votes recalculated. No, cman_poll_quorum_device does not recalculate; you have to tell cman to drop the votes. I was already working on a patch. What it does is: - if previously configured and device & label are no longer present: - print a log message - reregister with 0 votes (causes recalculate_quorum()) - clean shutdown e.g.: - write logout message to quorum disk - cman_unregister_quorum_device() Aug 3 12:36:30 crackle modcluster: Updating cluster.conf Aug 3 12:36:32 crackle corosync[1262]: [QUORUM] Members[2]: 1 2 Aug 3 12:36:32 crackle corosync[1262]: [CMAN ] quorum device re-registered Aug 3 12:36:32 crackle qdiskd[15384]: Quorum device removed from the configuration. Shutting down. Aug 3 12:36:43 crackle corosync[1262]: [CMAN ] lost contact with quorum device Aug 3 12:36:43 crackle corosync[1262]: [QUORUM] Members[2]: 1 2 Note however that because qdiskd was a member previously that it will still appear in both 'clustat' and 'cman_tool nodes' output. [root@crackle ~]# cman_tool status Version: 6.2.0 Config Version: 25 Cluster Name: cereal Cluster Id: 27600 Cluster Member: Yes Cluster Generation: 1248 Membership state: Cluster-Member Nodes: 2 Expected votes: 2 Total votes: 2 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 Node name: crackle Node ID: 2 Multicast addresses: 239.192.107.60 Node addresses: 192.168.122.21 (I used a two node cluster to illustrate that the fix works - if it didn't, expected votes would be 3 still). Created attachment 436324 [details]
Fix
Patch not applied to any branches at this point.
http://git.fedorahosted.org/git?p=cluster.git;a=commit;h=e118d34dce64325a93c92833b1e074fbabb1a516 Updated patch posted to upstream STABLE3 branch. Logs from updated patch: Aug 3 12:47:56 snap modcluster: Updating cluster.conf Aug 3 12:47:57 snap corosync[3446]: [QUORUM] Members[2]: 1 2 Aug 3 12:47:57 snap corosync[3446]: [CMAN ] quorum device re-registered Aug 3 12:47:57 snap corosync[3446]: [QUORUM] Members[2]: 1 2 Aug 3 12:47:57 snap qdiskd[5751]: Quorum device removed from the configuration. Shutting down. Aug 3 12:47:57 snap qdiskd[5751]: Unregistering quorum device. Aug 3 12:48:10 snap corosync[3446]: [CMAN ] lost contact with quorum device Aug 3 12:48:10 snap corosync[3446]: [QUORUM] Members[2]: 1 2 devel_ack, we already have the fix An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0537.html |