Bug 606989
| Summary: | cman expected vote does not drop when removing node from cluster. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dean Jansa <djansa> | ||||||||
| Component: | cluster | Assignee: | Lon Hohberger <lhh> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 6.0 | CC: | ccaulfie, cluster-maint, fdinitto, lhh, rpeterso, teigland | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | cluster-3.0.12-15.el6 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2010-11-10 19:59:27 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 599016 | ||||||||||
| Attachments: |
|
||||||||||
If I restart the cluster expected votes drops as expected: [root@marathon-05 ~]# service cman stop Stopping cluster: Leaving fence domain... s [ OK ] Stopping gfs_controld... ervi [ OK ] Stopping dlm_controld... ce [ OK ] Stopping fenced... c [ OK ] ma Stopping cman... n sta [ OK ] Waiting for corosync to shutdown: [ OK ] Unloading kernel modules... r [ OK ] Unmounting configfs... [ OK ] [root@marathon-05 ~]# service cman start Starting cluster: Checking Network Manager... [ OK ] Global setup... [ OK ] Loading kernel modules... [ OK ] Mounting configfs... [ OK ] Starting cman... [ OK ] Waiting for quorum... [ OK ] Starting fenced... [ OK ] Starting dlm_controld... [ OK ] Starting gfs_controld... [ OK ] Unfencing self... [ OK ] Joining fence domain... [ OK ] [root@marathon-05 ~]# cman_tool status Version: 6.2.0 Config Version: 1 Cluster Name: marathon Cluster Id: 20778 Cluster Member: Yes Cluster Generation: 312 Membership state: Cluster-Member Nodes: 4 Expected votes: 4 Total votes: 4 Node votes: 1 Quorum: 3 Created attachment 426207 [details]
Patch to fix
You're right, there is some code missing from the reload routine. Here it is
This patch is now in STABLE3 git:
commit e95deaf87607f483f4066e2cbc105ffa725ddd05
Author: Christine Caulfield <ccaulfie>
Date: Wed Jun 23 10:28:33 2010 +0100
cman: Recalculate expected_votes on a config reload.
[root@marathon-01 ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 32 2010-07-13 10:00:45 marathon-01
2 M 32 2010-07-13 10:00:45 marathon-02
3 M 36 2010-07-13 10:00:45 marathon-03
5 M 40 2010-07-13 10:00:46 marathon-05
# Remove marathon-05 from cluster.conf
[root@marathon-01 ~]# vi /etc/cluster/cluster.conf
# Distribute cluster.conf
[root@marathon-01 ~]# for m in marathon-0{2,3}
> do
> qacp /etc/cluster/cluster.conf root@${m}:/etc/cluster/cluster.conf
> done
/etc/cluster/cluster.conf -> marathon-02:/etc/cluster/cluster.conf
/etc/cluster/cluster.conf -> marathon-03:/etc/cluster/cluster.conf
# Remove marathon-5 from cluster
[root@marathon-05 ~]# service cman stop
Stopping cluster:
Leaving fence domain... [ OK ]
Stopping gfs_controld... [ OK ]
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Waiting for corosync to shutdown: [ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
# Have cman re-read the config file
[root@marathon-01 ~]# cman_tool version -r0
******* I had to run cman_tool version -r0 on ALL nodes, otherwise nodes which didn't run the cman_tool version -r0 did not update. *******
# Verify only 3 nodes
[root@marathon-01 ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 32 2010-07-13 10:00:45 marathon-01
2 M 32 2010-07-13 10:00:45 marathon-02
3 M 36 2010-07-13 10:00:45 marathon-03
# Verify nodes, votes and quorum drop
[root@marathon-01 ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: marathon
Cluster Id: 20778
Cluster Member: Yes
Cluster Generation: 44
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 3
FailsQA -- Quorum count is not dropped as shown in comment9. Should be 2. FailsQA -- Quorum count is not dropped as shown in comment9. Should be 2. Created attachment 431755 [details]
Patch to recalculate quorum
This additional (untested) patch will tell cman to recalculate quorum when the configuration is reloaded.
The patch didn't work the way Dean expected I think. With recalculate_quorum(1,0) (instead of 0,0 as in the patch), the patch seems to work fine. The question is whether this is something we -want- or if there's somehow that it might be dangerous for users (such that we want them to manually decrease expected votes). Here was the result after removing 1 node from a 4 node cluster in the config with recalculate_quorum(1,0): [root@marathon-01 ~]# cman_tool status Version: 6.2.0 Config Version: 3 Cluster Name: marathon Cluster Id: 20778 Cluster Member: Yes Cluster Generation: 632 Membership state: Cluster-Member Nodes: 3 Expected votes: 4 Total votes: 3 Node votes: 1 Quorum: 3 Active subsystems: 7 Flags: Ports Bound: 0 Node name: marathon-01 Node ID: 1 Multicast addresses: 239.192.81.123 Node addresses: 10.15.89.71 [root@marathon-01 ~]# cman_tool version -r0 [root@marathon-01 ~]# cman_tool status Version: 6.2.0 Config Version: 4 Cluster Name: marathon Cluster Id: 20778 Cluster Member: Yes Cluster Generation: 632 Membership state: Cluster-Member Nodes: 3 Expected votes: 3 Total votes: 3 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 Node name: marathon-01 Node ID: 1 Multicast addresses: 239.192.81.123 Node addresses: 10.15.89.71 *** Bug 616381 has been marked as a duplicate of this bug. *** Created attachment 433409 [details]
Recalculate quorum on quorum device vote changes
This patch allows cman to recalculate quorum when the quorum device votes change if and only if the quorum device was currently a participating member.
the new patch seems to do the job for me. *** Bug 616095 has been marked as a duplicate of this bug. *** Verified RHEL6.0-20100728.2 tree. pruner passes, 5 node -> 3 node and back. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |
Description of problem: Following the instructions to remove a node from a cluster without needing a cluster reboot I noticed that the expected votes (and quorum) do not drop as you remove nodes. Version-Release number of selected component (if applicable): cman-3.0.12-6.el6.x86_64 RHEL6.0-20100615.0-Server How reproducible: Every time # Starting with a 5 node cluster: [root@marathon-05 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 284 2010-06-22 15:52:43 marathon-01 2 M 284 2010-06-22 15:52:43 marathon-02 3 M 284 2010-06-22 15:52:43 marathon-03 4 M 284 2010-06-22 15:52:43 marathon-04 5 M 284 2010-06-22 15:52:43 marathon-05 [root@marathon-05 ~]# cman_tool status Version: 6.2.0 Config Version: 1 Cluster Name: marathon Cluster Id: 20778 Cluster Member: Yes Cluster Generation: 284 Membership state: Cluster-Member Nodes: 5 Expected votes: 5 Total votes: 5 Node votes: 1 Quorum: 3 Active subsystems: 7 Flags: Ports Bound: 0 Node name: marathon-05 Node ID: 5 Multicast addresses: 239.192.81.123 Node addresses: 10.15.89.75 # Remove marathon-01 from cluster.conf [root@marathon-05 ~]# vi /etc/cluster/cluster.conf # Distribute cluster.conf to remaining nodes [root@marathon-05 ~]# for m in marathon-0{1,2,3,4} > do > qacp /etc/cluster/cluster.conf root@${m}:/etc/cluster/cluster.conf > done /etc/cluster/cluster.conf -> marathon-01:/etc/cluster/cluster.conf /etc/cluster/cluster.conf -> marathon-02:/etc/cluster/cluster.conf /etc/cluster/cluster.conf -> marathon-03:/etc/cluster/cluster.conf /etc/cluster/cluster.conf -> marathon-04:/etc/cluster/cluster.conf # Remove marathon-01 from cluster [root@marathon-01 ~]# service cman stop Stopping cluster: Leaving fence domain... [ OK ] Stopping gfs_controld... [ OK ] Stopping dlm_controld... [ OK ] Stopping fenced... [ OK ] Stopping cman... [ OK ] Waiting for corosync to shutdown: [ OK ] Unloading kernel modules... [ OK ] Unmounting configfs... [ OK ] # Have cman re-read the config file [root@marathon-05 ~]# cman_tool version -r0 # All nodes now show: [root@marathon-05 ~]# cman_tool nodes Node Sts Inc Joined Name 2 M 284 2010-06-22 15:52:43 marathon-02 3 M 284 2010-06-22 15:52:43 marathon-03 4 M 284 2010-06-22 15:52:43 marathon-04 5 M 284 2010-06-22 15:52:43 marathon-05 But -- the expected votes has not dropped (nor quorum): [root@marathon-05 ~]# cman_tool status Version: 6.2.0 Config Version: 1 Cluster Name: marathon Cluster Id: 20778 Cluster Member: Yes Cluster Generation: 288 Membership state: Cluster-Member Nodes: 4 Expected votes: 5 Total votes: 4 Node votes: 1 Quorum: 3