Bug 831648 - rgmanager prefers 2 nodes in 3 nodes cluster
rgmanager prefers 2 nodes in 3 nodes cluster
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: rgmanager (Show other bugs)
6.3
Unspecified Unspecified
low Severity low
: rc
: ---
Assigned To: Ryan McCabe
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-13 10:06 EDT by Martin Kudlej
Modified: 2017-07-24 12:57 EDT (History)
4 users (show)

See Also:
Fixed In Version: rgmanager-3.0.12.1-15.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 05:18:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (2.11 KB, patch)
2012-10-15 09:25 EDT, Ryan McCabe
no flags Details | Diff

  None (edit)
Description Martin Kudlej 2012-06-13 10:06:00 EDT
Description of problem:
I've got 3 nodes in cluster(cluster configuration below). I periodically kill Condor agents on all nodes. Rgmanager prefers to start Condor agents only on 2 nodes(node02 and node03). Rgmanager starts Condor agent on first node(node01) just about in 1 case from 50 occasions.

Version-Release number of selected component (if applicable):
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
modcluster-0.16.2-18.el6.x86_64
lvm2-cluster-2.02.95-10.el6.x86_64
fence-virt-0.2.3-9.el6.x86_64
clusterlib-3.0.12.1-32.el6.x86_64
cman-3.0.12.1-32.el6.x86_64
rgmanager-3.0.12.1-12.el6.x86_64
condor-cluster-resource-agent-7.6.5-0.15.el6.x86_64
cluster-glue-libs-1.0.5-6.el6.x86_64
fence-agents-3.1.5-17.el6.x86_64
cluster-glue-1.0.5-6.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. install clustering + Condor
2. setup cluster configuration below
3. periodically kill Condor agents on all nodes

Expected results:
Rgmanager will not prefer any node for starting recovering agent.

Additional info:
$ cat /etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="86" name="HACondorCluster">
        <fence_daemon post_join_delay="30"/>
        <fence_xvmd debug="10" multicast_interface="eth1"/>
        <clusternodes>
                <clusternode name="xulqrxy-node01" nodeid="1">
                        <fence>
                                <method name="virt-fenc">
                                        <device domain="test" name="fenc"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="xulqrxy-node02" nodeid="2">
                        <fence>
                                <method name="virt-fenc">
                                        <device domain="test" name="fenc"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="xulqrxy-node03" nodeid="3">
                        <fence>
                                <method name="virt-fenc">
                                        <device domain="test" name="fenc"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1">
                <multicast addr="224.0.0.1"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_xvm" name="fenc"/>
        </fencedevices>
        <rm>   
                <failoverdomains>
                        <failoverdomain name="Schedd hasched1 Failover Domain" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="xulqrxy-node01"/>
                                <failoverdomainnode name="xulqrxy-node02"/>
                                <failoverdomainnode name="xulqrxy-node03"/>
                        </failoverdomain>
                        <failoverdomain name="Schedd hasched2 Failover Domain" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="xulqrxy-node01"/>
                                <failoverdomainnode name="xulqrxy-node02"/>
                                <failoverdomainnode name="xulqrxy-node03"/>
                        </failoverdomain>
                        <failoverdomain name="Schedd hasched3 Failover Domain" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="xulqrxy-node01"/>
                                <failoverdomainnode name="xulqrxy-node02"/>
                                <failoverdomainnode name="xulqrxy-node03"/>
                        </failoverdomain>
                        <failoverdomain name="Schedd hasched4 Failover Domain" nofailback="0" ordered="0" restricted="1">
                                <failoverdomainnode name="xulqrxy-node01"/>
                                <failoverdomainnode name="xulqrxy-node02"/>
                                <failoverdomainnode name="xulqrxy-node03"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                
       <service autostart="1" domain="Schedd hasched1 Failover Domain" name="HA Schedd hasched1" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej1" name="Job Queue for hasched1" options="rw,soft">  
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched1" type="schedd"/>
                        </netfs>
                </service>
                <service autostart="1" domain="Schedd hasched2 Failover Domain" name="HA Schedd hasched2" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej2" name="Job Queue for hasched2" options="rw,soft">
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched2" type="schedd"/>
                        </netfs>
                </service>
                <service autostart="1" domain="Schedd hasched3 Failover Domain" name="HA Schedd hasched3" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej3" name="Job Queue for hasched3" options="rw,soft">
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched3" type="schedd"/>
                        </netfs>
                </service>
                <service autostart="1" domain="Schedd hasched4 Failover Domain" name="HA Schedd hasched4" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej4" name="Job Queue for hasched4" options="rw,soft">
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched4" type="schedd"/>
                        </netfs>
                </service>
        </rm>
        <logging debug="on"/>
</cluster>
Comment 1 Jan Pokorný 2012-08-02 10:04:55 EDT
Admittedly, "Red Hat Cluster Suite" product in Bugzilla is tempting,
but no longer in use (no longer having a standalone position).

As per the packages, flipping to RHEL 6 -- rgmanager.
Comment 3 Lon Hohberger 2012-08-06 15:00:32 EDT
Yes, without ordering and priorities, rgmanager will not prefer any node.  Consider this configuration instead:

<?xml version="1.0"?>
<cluster config_version="86" name="HACondorCluster">
        <fence_daemon post_join_delay="30"/>
        <fence_xvmd debug="10" multicast_interface="eth1"/>
        <clusternodes>
                <clusternode name="xulqrxy-node01" nodeid="1">
                        <fence>
                                <method name="virt-fenc">
                                        <device domain="test" name="fenc"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="xulqrxy-node02" nodeid="2">
                        <fence>
                                <method name="virt-fenc">
                                        <device domain="test" name="fenc"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="xulqrxy-node03" nodeid="3">
                        <fence>
                                <method name="virt-fenc">
                                        <device domain="test" name="fenc"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1">
                <multicast addr="224.0.0.1"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_xvm" name="fenc"/>
        </fencedevices>
        <rm>   
                <failoverdomains>
                        <failoverdomain name="Schedd hasched1 Failover Domain" nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="xulqrxy-node01" priority="1"/>
                                <failoverdomainnode name="xulqrxy-node02" priority="2"/>
                                <failoverdomainnode name="xulqrxy-node03" priority="3"/>
                        </failoverdomain>
                        <failoverdomain name="Schedd hasched2 Failover Domain" nofailback="0" ordered="1" restricted="0">
                                <failoverdomainnode name="xulqrxy-node01" priority="3"/>
                                <failoverdomainnode name="xulqrxy-node02" priority="1"/>
                                <failoverdomainnode name="xulqrxy-node03" priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="Schedd hasched3 Failover Domain" nofailback="0" ordered="1" restricted="1">
                                <failoverdomainnode name="xulqrxy-node01" priority="2"/>
                                <failoverdomainnode name="xulqrxy-node02" priority="3"/>
                                <failoverdomainnode name="xulqrxy-node03" priority="1"/>
                        </failoverdomain>

                </failoverdomains>
                <resources/>
                
       <service autostart="1" domain="Schedd hasched1 Failover Domain" name="HA Schedd hasched1" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej1" name="Job Queue for hasched1" options="rw,soft">  
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched1" type="schedd"/>
                        </netfs>
                </service>
                <service autostart="1" domain="Schedd hasched2 Failover Domain" name="HA Schedd hasched2" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej2" name="Job Queue for hasched2" options="rw,soft">
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched2" type="schedd"/>
                        </netfs>
                </service>
                <service autostart="1" domain="Schedd hasched3 Failover Domain" name="HA Schedd hasched3" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej3" name="Job Queue for hasched3" options="rw,soft">
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched3" type="schedd"/>
                        </netfs>
                </service>
                <service autostart="1" name="HA Schedd hasched4" recovery="relocate">
                        <netfs export="/mnt/qa" force_unmount="on" host="nest.test.redhat.com" mountpoint="/mnt/qa/MRG/cluster_mkudlej4" name="Job Queue for hasched4" options="rw,soft">
                                <condor __independent_subtree="1" __max_restarts="3" __restart_expire_time="300" name="hasched4" type="schedd"/>
                        </netfs>
                </service>
        </rm>
        <logging debug="on"/>
</cluster>


If required, you can also make rgmanager be more "random" with node placement for "HA Schedd hasched4" by adding 'central_processing="1"' to the <rm> tag [NOTE: To do this, stop rgmanager on all hosts, then add it, then restart rgmanager on all hosts]
Comment 4 Lon Hohberger 2012-08-06 15:01:11 EDT
Oops, that third domain doesn't need 'restricted="1"' (it doesn't matter that it's there, but it's not necessary).
Comment 6 Lon Hohberger 2012-08-06 16:51:19 EDT
Let me know if this improves things.
Comment 7 Lon Hohberger 2012-08-06 16:52:18 EDT
Oh, also, 'recovery' should probably be 'restart', not 'relocate'.
Comment 8 Jaroslav Kortus 2012-08-13 06:32:54 EDT
I've tested this with very simple configuration:
  <rm>
    <resources>
      <script name="servicescript" file="/bin/true"/>
    </resources>
    <service autostart="1" name="service1" recovery="relocate">
        <script ref="servicescript"/>
    </service>
  </rm>

And here are the results:
]$ clustat
Cluster Status for STSRHTS16926 @ Mon Aug 13 05:18:39 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 c2-node01                                                           1 Online, Local, rgmanager
 c2-node02                                                           2 Online, rgmanager
 c2-node03                                                           3 Online, rgmanager

(rgmanager running on all 3 nodes)


# for i in `seq 1 500`; do echo "Iteration $i"; clusvcadm -r service1 ;  clustat -x | gxpp '//group/@owner' >> owners.txt; done
[...]
# sort owners.txt | uniq -c
    250 c2-node02
    250 c2-node03

It's clear that node01 was ignored in 500 attempts. Expectation is to get around 1/3 of relocations on node01.
Comment 9 Jaroslav Kortus 2012-08-13 07:50:05 EDT
it seems that it always jumps between two nodes with highest ID.. If I set nodeid="6" for c2-node01, I get the same results (to highest IDs win).

The same applies for 5-node cluster (node04 and node05 will have the service all the time).

I've missed this is actually a rhel6 bug, so the tests were performed on rhel5 (rgmanager-2.0.52-34.el5), but I assume not much has changed in this regard :).
Comment 11 Ryan McCabe 2012-10-15 09:25:02 EDT
Created attachment 627416 [details]
Proposed patch
Comment 15 errata-xmlrpc 2013-02-21 05:18:19 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0409.html

Note You need to log in before you can comment on or make changes to this bug.