Bug 838047 - qdiskd master_wins needs harder config error checking or cman needs to improve expected_votes calculation
qdiskd master_wins needs harder config error checking or cman needs to improv...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster (Show other bugs)
6.4
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Fabio Massimo Di Nitto
michal novacek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-06 04:48 EDT by Fabio Massimo Di Nitto
Modified: 2013-02-21 02:42 EST (History)
7 users (show)

See Also:
Fixed In Version: cluster-3.0.12.1-33.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 02:42:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
proposed patch (3.67 KB, patch)
2012-07-09 02:39 EDT, Fabio Massimo Di Nitto
lhh: review+
Details | Diff

  None (edit)
Description Fabio Massimo Di Nitto 2012-07-06 04:48:53 EDT
An incorrect master_wins config can effectively lead a cluster to have 2 quorate partitions, racing for fencing.

We shouldn´t allow it by default.

This is how to reproduce:

<cluster name="fabbione" config_version="1" >
  <logging debug="on"/>
  <clusternodes>
    <clusternode name="fedora16-node1" votes="1" nodeid="1">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node2" votes="1" nodeid="2">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node3" votes="1" nodeid="3">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node3"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node4" votes="1" nodeid="4">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node4"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node5" votes="1" nodeid="5">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node5"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node6" votes="1" nodeid="6">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node6"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node7" votes="1" nodeid="7">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node7"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="fedora16-node8" votes="1" nodeid="8">
      <fence>
        <method name="single">
          <device name="xvm" domain="fedora16-node8"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
    <fencedevice name="xvm" agent="fence_xvm"/>
  </fencedevices>
  <quorumd label="qdisk" master_wins="1"/>
....

cman and qdisk will start happily.

node1 becomes qdiskd master

[root@fedora16-node1 ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: fabbione
Cluster Id: 25573
Cluster Member: Yes
Cluster Generation: 72
Membership state: Cluster-Member
Nodes: 8
Expected votes: 8
Quorum device votes: 7
Total votes: 15
Node votes: 1
Quorum: 8  

every other node looks like:

[root@fedora16-node2 ~]# cman_tool status
Version: 6.2.0
Config Version: 1
Cluster Name: fabbione
Cluster Id: 25573
Cluster Member: Yes
Cluster Generation: 72
Membership state: Cluster-Member
Nodes: 8
Expected votes: 8
Quorum device votes: 0
Total votes: 8
Node votes: 1
Quorum: 5  

note quorum: 8 vs 5.

In a partition event such as:

partition1: node1/2/3 (1 is still qdiskd master)
partition2: node4/5/6/7/8

node1 is quorate due to master_wins
partition2 is quorate (5 nodes)

At this point, node1 will race with partition2 to fence. result is random. In some cases I get master_wins, other times i get partition2 to win.

One reason why partition2 is quorate is because cman does not take into account qdiskd votes in expected_votes will qdiskd starts voting. In master_win, qdiskd votes 0.

While the scenario is unlikely to happen, and rather uncommon, it breaks quorum directive number 1: there is only one quorum at a time.
Comment 1 Fabio Massimo Di Nitto 2012-07-06 09:19:23 EDT
Agreed with Lon:

master_wins should only be used with 2 node cluster. Fix qdiskd to disable master_wins if node count > 2 or votes !=  1.

update man page to reflect the requirement.
Comment 2 Fabio Massimo Di Nitto 2012-07-09 02:39:23 EDT
Created attachment 597007 [details]
proposed patch

Unit test results:

configured 8 nodes

  <quorumd label="qdisk">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

starts ok, no master win

  <quorumd label="qdisk" master_wins="1">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

[root@fedora16-node1 qdisk]# ./qdiskd -d -f
Loading logging configuration
Loading dynamic configuration
Setting votes to 7
Loading static configuration
Auto-configured TKO as 4 based on token=10000 interval=1
Timings: 4 tko, 1 interval
Timings: 2 tko_up, 3 master_wait, 2 upgrade_wait
Heuristic: 'ping daikengo.int.fabbione.net -c1 -t1' score=1 interval=2 tko=3
Heuristic: 'ping vultus5.int.fabbione.net -c1 -t1' score=1 interval=2 tko=3
2 heuristics loaded
Master-wins mode disabled (not compatible with heuristics)
Master-wins mode disabled (not compatible with more than 2 nodes)


  <quorumd label="qdisk" master_wins="1"/>

[root@fedora16-node1 qdisk]# ./qdiskd -d -f
Loading logging configuration
Loading dynamic configuration
Setting votes to 7
Loading static configuration
Auto-configured TKO as 4 based on token=10000 interval=1
Timings: 4 tko, 1 interval
Timings: 2 tko_up, 3 master_wait, 2 upgrade_wait
0 heuristics loaded
Master-wins mode disabled (not compatible with more than 2 nodes)

configured 2 nodes:

  <quorumd label="qdisk"/>

normal startup:

[root@fedora16-node1 ~]# cman_tool status
Quorum device votes: 1

[root@fedora16-node2 ~]# cman_tool status
Quorum device votes: 0

  <quorumd label="qdisk">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>


   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

[root@fedora16-node1 ~]# cman_tool status
Quorum device votes: 1

[root@fedora16-node2 ~]# cman_tool status
Quorum device votes: 1

  <quorumd label="qdisk" master_wins="1">
   <heuristic program="ping daikengo.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
   <heuristic program="ping vultus5.int.fabbione.net -c1 -t1" score="1" interval="2" tko="3"/>
  </quorumd>

Master-wins mode disabled (not compatible with heuristics)

  <quorumd label="qdisk" master_wins="1"/>

works as expected

  <quorumd label="qdisk" votes="1"/>

works as expected

upgrade (vote changing)
  <quorumd label="qdisk" votes="5"/>

Changing vote count from 1 to 5
Vote count changed! Disabling master-wins

[root@fedora16-node2 ~]# cman_tool status
Quorum device votes: 5

[root@fedora16-node1 ~]# cman_tool status
Quorum device votes: 5
Comment 5 michal novacek 2013-01-23 08:19:36 EST

For each of the following case I changed cluster.conf, restarted cman service
on all cluster nodes and checked that nodes do have correct number of votes.

It worked correctly for all these cases with two node cluster and eight node
cluster.

cman version 3.0.12.1-49.el6.x86_64 has been used.

qdisk uses master_wins only when all of the below is true:
    cluster have two nodes 
    heuristics is not used          [1][2]
    number of votes is not defined

qdisk does NOT use master_wins when any of the following is true:
    cluster has more than two nodes
    <heuristics /> is present       [3][4]
    nodes have votes assigned       [5][6]

---

[1]
<quorumd label="$label" master_wins="1" />
nodes have different number of votes

[2]
<quorumd label="$label" />
nodes have different number of votes:

[3]
<quorumd label="$label">
    <heuristic interval="1" program="ping -c1 -w2 sts.lab.msp.redhat.com" score="1" tko="3"/>
</quorumd>
both nodes have equal number of votes

[4]
<quorumd label="$label" master_wins="1">
    <heuristic interval="1" program="ping -c1 -w2 sts.lab.msp.redhat.com" score="1" tko="3"/>
</quorumd>
both nodes have equal number of votes

[5]
<quorumd label="$label" votes="1"/>
both nodes have equal number of votes, one vote each

[6]
<quorumd label="$label" votes="5"/>
both nodes have equal number of votes, 5 votes each
Comment 7 errata-xmlrpc 2013-02-21 02:42:32 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0287.html

Note You need to log in before you can comment on or make changes to this bug.