Bug 1104740

Summary: fence-virt can't fence dead VMs
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: fence-virtAssignee: Ryan McCabe <rmccabe>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.5CC: cluster-maint, djansa, jkortus, mgrac, rbalakri
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: fence-virt-0.2.3-17.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1111384 (view as bug list) Environment:
Last Closed: 2014-10-14 08:19:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1111384    

Description Fabio Massimo Di Nitto 2014-06-04 15:03:53 UTC
There is an issue in fence-virt virt backend in the way the list of Domains are gathered from the libvirt.

One simple test:

start a bunch of VMs...

[root@rhel6-ha-node1 ~]# fence_xvm -o list
Could not read /etc/cluster/fence_xvm.key; trying without authentication
rhel6-ha-node1       c632d8b3-875e-490a-85b0-94e233d04b68 on
rhel6-ha-node2       ba1274ec-1d43-4b39-a71f-e85b90fc7586 on
rhel6-ha-node3       c444b48e-6b50-466e-a685-fddd49ed171c on
rhel6-ha-node4       c5cbee4d-06ac-44bb-adf5-78aaf5a2a541 on

from the host find the qemu process for one of the VM. kill -9...

[root@rhel6-ha-node1 ~]# fence_xvm -o list
Could not read /etc/cluster/fence_xvm.key; trying without authentication
rhel6-ha-node1       c632d8b3-875e-490a-85b0-94e233d04b68 on
rhel6-ha-node2       ba1274ec-1d43-4b39-a71f-e85b90fc7586 on
rhel6-ha-node3       c444b48e-6b50-466e-a685-fddd49ed171c on

At this point the cluster is stuck because fence_virtd is unable to find the domain running and won't take any action.

The issue in the code is located in server/virt*.c:

virt_list_t *vl_get(virConnectPtr vp, int my_id)

we use virConnectNumOfDomains that only list active VMs and it's also racy with ListDomain.

We should change all that code to virConnectListAllDomains that can list both active and dead (including filters) to find the VM we need to fence and it's not racy.

Comment 3 errata-xmlrpc 2014-10-14 08:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1589.html