Bug 853927 - fence_virtd returns incorrect error code to fence_xvm/fence_virt when virt domain does not exists
fence_virtd returns incorrect error code to fence_xvm/fence_virt when virt do...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: fence-virt (Show other bugs)
6.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Ryan McCabe
Chris Mackowski
:
: 870549 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-03 06:40 EDT by Frantisek Reznicek
Modified: 2015-11-15 20:14 EST (History)
13 users (show)

See Also:
Fixed In Version: fence-virt-0.2.3-13.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 05:24:57 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix (860 bytes, patch)
2012-09-10 15:10 EDT, Ryan McCabe
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 294813 None None None Never

  None (edit)
Description Frantisek Reznicek 2012-09-03 06:40:45 EDT
Description of problem:

fence_node tool does not always identify that fencing failed (especially when fence_xvm + fence_virtd used).

There are situations when fencing using fence_node <node-name> lead to success, but fencing did not happened.

Situation:
- three node VM cluster based on RHEL6.3 i686, x86_64, x86_64
  - cluster.conf specifying three fence_xvm fence devices
- VM / cluster host on RHEL 6.3 x86_64
  - fence-virtd running to connect to fence requests


[root@dhcp-x-y ~]# clustat
Cluster Status for mycluster_el6vm @ Mon Sep  3 12:08:48 2012
Member Status: Quorate

 Member Name                         ID   Status
 ------ ----                         ---- ------
 192.168.10.11                           1 Online
 192.168.10.12                           2 Online
 192.168.10.13                           3 Online, Local

Fencing node looks to succeed, but fencing action was not performed due to libvirt error...

[root@dhcp-x-y ~]# fence_node 192.168.10.11
fence 192.168.10.11 success

But on VM host fence-virtd says:

Request 2 seqno 289568 domain 192.168.10.11
Plain TCP request
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 8
Request 2 seqno 289568 src 192.168.10.11 target 192.168.10.11
libvirt_reboot 192.168.10.11
libvir: QEMU error : Domain not found: no domain with matching name '192.168.10.11'
[libvirt:REBOOT] Nothing to do - domain does not exist
Sending response to caller...

Fence_node does not correctly say whether fencing was performed without errors.


Version-Release number of selected component (if applicable):
# guests
cman-3.0.12.1-32.el6_3.1.i686
corosync-1.4.1-7.el6.i686
clusterlib-3.0.12.1-32.el6_3.1.i686
ricci-0.16.2-55.el6.i686

# host
fence-virt-0.2.3-9.el6.x86_64
fence-virtd-0.2.3-9.el6.x86_64
fence-virtd-checkpoint-0.2.3-9.el6.x86_64
fence-virtd-libvirt-0.2.3-9.el6.x86_64
fence-virtd-multicast-0.2.3-9.el6.x86_64
fence-virtd-serial-0.2.3-9.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. set-up VM fencing
2. perform 'fence_xvm -o list' to verify fence-virt functionality
3. fence_node <cluster-node>
  
Actual results:
fence_node fails claiming to succeed (ecode:0)

Expected results:
Whan fencing fails fence_node should fail ecode != 0.

Additional info:

[root@dhcp-lab-x ~]# clustat
Cluster Status for mycluster_el6vm @ Mon Sep  3 12:08:52 2012
Member Status: Quorate

 Member Name                          ID   Status
 ------ ----                          ---- ------
 192.168.10.11                            1 Online
 192.168.10.12                            2 Online, Local
 192.168.10.13                            3 Online

[root@dhcp-x-y ~]# fence_node 192.168.10.11
fence 192.168.10.11 success
[root@dhcp-x-y ~]# echo $?
0

# after that 192.168.10.11 does not reboot

# host fence-virtd logs:
...
Request 2 seqno 289568 domain 192.168.10.11
Plain TCP request
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 8
Request 2 seqno 289568 src 192.168.10.11 target 192.168.10.11
libvirt_reboot 192.168.10.11
libvir: QEMU error : Domain not found: no domain with matching name '192.168.10.11'
[libvirt:REBOOT] Nothing to do - domain does not exist
Sending response to caller...
...

[root@dhcp-lab-x ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="mycluster_el6vm">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
        <fence_xvmd debug="10" multicast_interface="eth3"/>
        <clusternodes>
                <clusternode name="192.168.10.11" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="192.168.10.11" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.12" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="192.168.10.12" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.13" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="192.168.10.13" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman port="1229">
                <multicast addr="225.0.0.12"/>
        </cman>
        <rm log_level="7">
                <failoverdomains>
                        <failoverdomain name="domain_qpidd_1" restricted="1">
                                <failoverdomainnode name="192.168.10.11" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_2" restricted="1">
                                <failoverdomainnode name="192.168.10.12" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_3" restricted="1">
                                <failoverdomainnode name="192.168.10.13" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/qpidd" name="qpidd"/>
                </resources>
                <service domain="domain_qpidd_1" name="qpidd_1">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_2" name="qpidd_2">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_3" name="qpidd_3">
                        <script ref="qpidd"/>
                </service>
        </rm>
        <fencedevices>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
        </fencedevices>
</cluster>


[root@HOST ~]# cat /etc/fence_virt.conf
fence_virtd {
        listener = "multicast";
        backend = "libvirt";
        module_path = "/usr/lib64/fence-virt";
}

listeners {
        multicast {
                key_file = "/etc/cluster/fence_xvm.key";
                address = "225.0.0.12";
                port = "1229";
                family = "ipv4";
                interface = "virbr4";
        }
}

backends {
        libvirt { 
                uri = "qemu:///system";
        }
}
Comment 1 Frantisek Reznicek 2012-09-03 06:46:52 EDT
This issue I was seeing above is due to misconfiguration that fencing domain was not set to libvirt/QEMU VM domain name.

Regardless of that fact fence_node should provide consistent information to user how the requested operation finished.


# fence_xvm -o list
cluster-rhel6i0      9747d8b2-9e04-6e84-b920-953651e32251 on
cluster-rhel6x0      62079d69-33c4-7133-65a6-7ae0db52131e on
cluster-rhel6x1      b8d18c15-bbed-7496-1af4-90afa0cdf95f on

# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="8" name="mycluster_el6vm">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
        <fence_xvmd debug="10" multicast_interface="eth3"/>
        <clusternodes>
                <clusternode name="192.168.10.11" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="cluster-rhel6i0" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.12" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="cluster-rhel6x0" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.13" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="cluster-rhel6x1" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman port="1229">
                <multicast addr="225.0.0.12"/>
        </cman>
        <rm log_level="7">
                <failoverdomains>
                        <failoverdomain name="domain_qpidd_1" restricted="1">
                                <failoverdomainnode name="192.168.10.11" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_2" restricted="1">
                                <failoverdomainnode name="192.168.10.12" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_3" restricted="1">
                                <failoverdomainnode name="192.168.10.13" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/qpidd" name="qpidd"/>
                </resources>
                <service domain="domain_qpidd_1" name="qpidd_1">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_2" name="qpidd_2">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_3" name="qpidd_3">
                        <script ref="qpidd"/>
                </service>
        </rm>
        <fencedevices>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
        </fencedevices>
</cluster>
Comment 3 Ryan McCabe 2012-09-10 15:10:40 EDT
Created attachment 611546 [details]
Fix
Comment 5 Ryan McCabe 2012-10-26 18:31:55 EDT
*** Bug 870549 has been marked as a duplicate of this bug. ***
Comment 6 Nate Straz 2013-01-21 16:31:48 EST
Marking Verified because I got through fencing and skeet testing which was tripping over this earlier in the release cycle.
Comment 8 errata-xmlrpc 2013-02-21 05:24:57 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0419.html

Note You need to log in before you can comment on or make changes to this bug.