853927 – fence_virtd returns incorrect error code to fence_xvm/fence_virt when virt domain does not exists

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 853927 - fence_virtd returns incorrect error code to fence_xvm/fence_virt when virt domain does not exists

Summary: fence_virtd returns incorrect error code to fence_xvm/fence_virt when virt do...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	fence-virt
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Ryan McCabe
QA Contact:	Chris Mackowski
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	870549 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-03 10:40 UTC by Frantisek Reznicek
Modified:	2018-11-30 20:04 UTC (History)
CC List:	13 users (show)
Fixed In Version:	fence-virt-0.2.3-13.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 10:24:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Fix (860 bytes, patch) 2012-09-10 19:10 UTC, Ryan McCabe	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	294813	0	None	None	None	Never
Red Hat Product Errata	RHBA-2013:0419	0	normal	SHIPPED_LIVE	fence-virt bug fix and enhancement update	2013-02-20 20:49:42 UTC

Description Frantisek Reznicek 2012-09-03 10:40:45 UTC

Description of problem:

fence_node tool does not always identify that fencing failed (especially when fence_xvm + fence_virtd used).

There are situations when fencing using fence_node <node-name> lead to success, but fencing did not happened.

Situation:
- three node VM cluster based on RHEL6.3 i686, x86_64, x86_64
  - cluster.conf specifying three fence_xvm fence devices
- VM / cluster host on RHEL 6.3 x86_64
  - fence-virtd running to connect to fence requests


[root@dhcp-x-y ~]# clustat
Cluster Status for mycluster_el6vm @ Mon Sep  3 12:08:48 2012
Member Status: Quorate

 Member Name                         ID   Status
 ------ ----                         ---- ------
 192.168.10.11                           1 Online
 192.168.10.12                           2 Online
 192.168.10.13                           3 Online, Local

Fencing node looks to succeed, but fencing action was not performed due to libvirt error...

[root@dhcp-x-y ~]# fence_node 192.168.10.11
fence 192.168.10.11 success

But on VM host fence-virtd says:

Request 2 seqno 289568 domain 192.168.10.11
Plain TCP request
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 8
Request 2 seqno 289568 src 192.168.10.11 target 192.168.10.11
libvirt_reboot 192.168.10.11
libvir: QEMU error : Domain not found: no domain with matching name '192.168.10.11'
[libvirt:REBOOT] Nothing to do - domain does not exist
Sending response to caller...

Fence_node does not correctly say whether fencing was performed without errors.


Version-Release number of selected component (if applicable):
# guests
cman-3.0.12.1-32.el6_3.1.i686
corosync-1.4.1-7.el6.i686
clusterlib-3.0.12.1-32.el6_3.1.i686
ricci-0.16.2-55.el6.i686

# host
fence-virt-0.2.3-9.el6.x86_64
fence-virtd-0.2.3-9.el6.x86_64
fence-virtd-checkpoint-0.2.3-9.el6.x86_64
fence-virtd-libvirt-0.2.3-9.el6.x86_64
fence-virtd-multicast-0.2.3-9.el6.x86_64
fence-virtd-serial-0.2.3-9.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. set-up VM fencing
2. perform 'fence_xvm -o list' to verify fence-virt functionality
3. fence_node <cluster-node>
  
Actual results:
fence_node fails claiming to succeed (ecode:0)

Expected results:
Whan fencing fails fence_node should fail ecode != 0.

Additional info:

[root@dhcp-lab-x ~]# clustat
Cluster Status for mycluster_el6vm @ Mon Sep  3 12:08:52 2012
Member Status: Quorate

 Member Name                          ID   Status
 ------ ----                          ---- ------
 192.168.10.11                            1 Online
 192.168.10.12                            2 Online, Local
 192.168.10.13                            3 Online

[root@dhcp-x-y ~]# fence_node 192.168.10.11
fence 192.168.10.11 success
[root@dhcp-x-y ~]# echo $?
0

# after that 192.168.10.11 does not reboot

# host fence-virtd logs:
...
Request 2 seqno 289568 domain 192.168.10.11
Plain TCP request
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 8
Request 2 seqno 289568 src 192.168.10.11 target 192.168.10.11
libvirt_reboot 192.168.10.11
libvir: QEMU error : Domain not found: no domain with matching name '192.168.10.11'
[libvirt:REBOOT] Nothing to do - domain does not exist
Sending response to caller...
...

[root@dhcp-lab-x ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="mycluster_el6vm">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
        <fence_xvmd debug="10" multicast_interface="eth3"/>
        <clusternodes>
                <clusternode name="192.168.10.11" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="192.168.10.11" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.12" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="192.168.10.12" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.13" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="192.168.10.13" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman port="1229">
                <multicast addr="225.0.0.12"/>
        </cman>
        <rm log_level="7">
                <failoverdomains>
                        <failoverdomain name="domain_qpidd_1" restricted="1">
                                <failoverdomainnode name="192.168.10.11" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_2" restricted="1">
                                <failoverdomainnode name="192.168.10.12" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_3" restricted="1">
                                <failoverdomainnode name="192.168.10.13" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/qpidd" name="qpidd"/>
                </resources>
                <service domain="domain_qpidd_1" name="qpidd_1">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_2" name="qpidd_2">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_3" name="qpidd_3">
                        <script ref="qpidd"/>
                </service>
        </rm>
        <fencedevices>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
        </fencedevices>
</cluster>


[root@HOST ~]# cat /etc/fence_virt.conf
fence_virtd {
        listener = "multicast";
        backend = "libvirt";
        module_path = "/usr/lib64/fence-virt";
}

listeners {
        multicast {
                key_file = "/etc/cluster/fence_xvm.key";
                address = "225.0.0.12";
                port = "1229";
                family = "ipv4";
                interface = "virbr4";
        }
}

backends {
        libvirt { 
                uri = "qemu:///system";
        }
}

Comment 1 Frantisek Reznicek 2012-09-03 10:46:52 UTC

This issue I was seeing above is due to misconfiguration that fencing domain was not set to libvirt/QEMU VM domain name.

Regardless of that fact fence_node should provide consistent information to user how the requested operation finished.


# fence_xvm -o list
cluster-rhel6i0      9747d8b2-9e04-6e84-b920-953651e32251 on
cluster-rhel6x0      62079d69-33c4-7133-65a6-7ae0db52131e on
cluster-rhel6x1      b8d18c15-bbed-7496-1af4-90afa0cdf95f on

# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="8" name="mycluster_el6vm">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="30"/>
        <fence_xvmd debug="10" multicast_interface="eth3"/>
        <clusternodes>
                <clusternode name="192.168.10.11" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="cluster-rhel6i0" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.12" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="cluster-rhel6x0" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="192.168.10.13" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="cluster-rhel6x1" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman port="1229">
                <multicast addr="225.0.0.12"/>
        </cman>
        <rm log_level="7">
                <failoverdomains>
                        <failoverdomain name="domain_qpidd_1" restricted="1">
                                <failoverdomainnode name="192.168.10.11" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_2" restricted="1">
                                <failoverdomainnode name="192.168.10.12" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="domain_qpidd_3" restricted="1">
                                <failoverdomainnode name="192.168.10.13" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <script file="/etc/init.d/qpidd" name="qpidd"/>
                </resources>
                <service domain="domain_qpidd_1" name="qpidd_1">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_2" name="qpidd_2">
                        <script ref="qpidd"/>
                </service>
                <service domain="domain_qpidd_3" name="qpidd_3">
                        <script ref="qpidd"/>
                </service>
        </rm>
        <fencedevices>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_1"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_2"/>
                <fencedevice action="reboot" agent="fence_xvm" key_file="/etc/cluster/fence_xvm.key" name="fence_3"/>
        </fencedevices>
</cluster>

Comment 3 Ryan McCabe 2012-09-10 19:10:40 UTC

Created attachment 611546 [details]
Fix

Comment 5 Ryan McCabe 2012-10-26 22:31:55 UTC

*** Bug 870549 has been marked as a duplicate of this bug. ***

Comment 6 Nate Straz 2013-01-21 21:31:48 UTC

Marking Verified because I got through fencing and skeet testing which was tripping over this earlier in the release cycle.

Comment 8 errata-xmlrpc 2013-02-21 10:24:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0419.html

Note You need to log in before you can comment on or make changes to this bug.