Bug 838047

Summary: qdiskd master_wins needs harder config error checking or cman needs to improve expected_votes calculation
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: clusterAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: michal novacek <mnovacek>
Severity: high Docs Contact:
Priority: high    
Version: 6.4CC: ccaulfie, cluster-maint, lhh, mjuricek, mnovacek, rpeterso, teigland
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cluster-3.0.12.1-33.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:42:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed patch lhh: review+

Description Fabio Massimo Di Nitto 2012-07-06 08:48:53 UTC
An incorrect master_wins config can effectively lead a cluster to have 2 quorate partitions, racing for fencing.

We shouldn´t allow it by default.

This is how to reproduce:

<cluster name="fabbione" config_version="1" >
  <logging debug="on"/>
  <clusternodes>
    <clusternode name="fedora16-node1" votes="1" nodeid="1">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node2" votes="1" nodeid="2">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node3" votes="1" nodeid="3">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node3"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node4" votes="1" nodeid="4">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node4"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node5" votes="1" nodeid="5">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node5"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node6" votes="1" nodeid="6">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node6"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node7" votes="1" nodeid="7">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node7"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node8" votes="1" nodeid="8">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node8"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="xvm" agent="fence_xvm"/>
  </fencedevices>
  <quorumd label="qdisk" master_wins="1"/>
....

cman and qdisk will start happily.

node1 becomes qdiskd master

[root@fedora16-node1 ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: fabbione
Cluster Id: 25573
Cluster Member: Yes
Cluster Generation: 72
Membership state: Cluster-Member
Nodes: 8
Expected votes: 8
Quorum device votes: 7
Total votes: 15
Node votes: 1
Quorum: 8  

every other node looks like:

[root@fedora16-node2 ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: fabbione
Cluster Id: 25573
Cluster Member: Yes
Cluster Generation: 72
Membership state: Cluster-Member
Nodes: 8
Expected votes: 8
Quorum device votes: 0
Total votes: 8
Node votes: 1
Quorum: 5  

note quorum: 8 vs 5.

In a partition event such as:

partition1: node1/2/3 (1 is still qdiskd master)
partition2: node4/5/6/7/8

node1 is quorate due to master_wins
partition2 is quorate (5 nodes)

At this point, node1 will race with partition2 to fence. result is random. In some cases I get master_wins, other times i get partition2 to win.

One reason why partition2 is quorate is because cman does not take into account qdiskd votes in expected_votes will qdiskd starts voting. In master_win, qdiskd votes 0.

While the scenario is unlikely to happen, and rather uncommon, it breaks quorum directive number 1: there is only one quorum at a time.

Comment 1 Fabio Massimo Di Nitto 2012-07-06 13:19:23 UTC
Agreed with Lon:

master_wins should only be used with 2 node cluster. Fix qdiskd to disable master_wins if node count > 2 or votes !=  1.

update man page to reflect the requirement.

Comment 2 Fabio Massimo Di Nitto 2012-07-09 06:39:23 UTC
Created attachment 597007 [details]
proposed patch

Unit test results:

configured 8 nodes

  <quorumd label="qdisk">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

starts ok, no master win

  <quorumd label="qdisk" master_wins="1">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

[root@fedora16-node1 qdisk]# ./qdiskd -d -f
Loading logging configuration
Loading dynamic configuration
Setting votes to 7
Loading static configuration
Auto-configured TKO as 4 based on token=10000 interval=1
Timings: 4 tko, 1 interval
Timings: 2 tko_up, 3 master_wait, 2 upgrade_wait
Heuristic: 'ping daikengo.int.fabbione.net -c1 -t1' score=1 interval=2 tko=3
Heuristic: 'ping vultus5.int.fabbione.net -c1 -t1' score=1 interval=2 tko=3
2 heuristics loaded
Master-wins mode disabled (not compatible with heuristics)
Master-wins mode disabled (not compatible with more than 2 nodes)


  <quorumd label="qdisk" master_wins="1"/>

[root@fedora16-node1 qdisk]# ./qdiskd -d -f
Loading logging configuration
Loading dynamic configuration
Setting votes to 7
Loading static configuration
Auto-configured TKO as 4 based on token=10000 interval=1
Timings: 4 tko, 1 interval
Timings: 2 tko_up, 3 master_wait, 2 upgrade_wait
0 heuristics loaded
Master-wins mode disabled (not compatible with more than 2 nodes)

configured 2 nodes:

  <quorumd label="qdisk"/>

normal startup:

[root@fedora16-node1 ~]# cman_tool status
Quorum device votes: 1

[root@fedora16-node2 ~]# cman_tool status
Quorum device votes: 0

  <quorumd label="qdisk">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>


   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

[root@fedora16-node1 ~]# cman_tool status
Quorum device votes: 1

[root@fedora16-node2 ~]# cman_tool status
Quorum device votes: 1

  <quorumd label="qdisk" master_wins="1">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

Master-wins mode disabled (not compatible with heuristics)

  <quorumd label="qdisk" master_wins="1"/>

works as expected

  <quorumd label="qdisk" votes="1"/>

works as expected

upgrade (vote changing)
  <quorumd label="qdisk" votes="5"/>

Changing vote count from 1 to 5
Vote count changed! Disabling master-wins

[root@fedora16-node2 ~]# cman_tool status
Quorum device votes: 5

[root@fedora16-node1 ~]# cman_tool status
Quorum device votes: 5

Comment 5 michal novacek 2013-01-23 13:19:36 UTC

For each of the following case I changed cluster.conf, restarted cman service
on all cluster nodes and checked that nodes do have correct number of votes.

It worked correctly for all these cases with two node cluster and eight node
cluster.

cman version 3.0.12.1-49.el6.x86_64 has been used.

qdisk uses master_wins only when all of the below is true:
    cluster have two nodes 
    heuristics is not used          [1][2]
    number of votes is not defined

qdisk does NOT use master_wins when any of the following is true:
    cluster has more than two nodes
    <heuristics /> is present       [3][4]
    nodes have votes assigned       [5][6]

---

[1]
<quorumd label="$label" master_wins="1" />
nodes have different number of votes

[2]
<quorumd label="$label" />
nodes have different number of votes:

[3]
<quorumd label="$label">
    <heuristic interval="1" program="ping -c1 -w2 sts.lab.msp.redhat.com" score="1" tko="3"/>
</quorumd>
both nodes have equal number of votes

[4]
<quorumd label="$label" master_wins="1">
    <heuristic interval="1" program="ping -c1 -w2 sts.lab.msp.redhat.com" score="1" tko="3"/>
</quorumd>
both nodes have equal number of votes

[5]
<quorumd label="$label" votes="1"/>
both nodes have equal number of votes, one vote each

[6]
<quorumd label="$label" votes="5"/>
both nodes have equal number of votes, 5 votes each

Comment 7 errata-xmlrpc 2013-02-21 07:42:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0287.html