Bug 480178

Summary: fence_xvmd Fails to Reboot VM
Product: Red Hat Enterprise Linux 5 Reporter: Gavin Edwards <gaedward>
Component: cmanAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.2CC: cfeist, cluster-maint, djansa, edamato, gaedward, kmoriwak, rlerch
Target Milestone: rc   
Target Release: 5.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cman-2.0.100-1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:06:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix
none
Logs
none
Fixed patch. none

Description Gavin Edwards 2009-01-15 16:09:10 UTC
Description of problem:
When I issue a manual fence_xvm command to test fencing a Xen VM, the instance is shut down but no restarted.

If I run "fence_xvmd -fdddd" for debugging I see the following output:
Domain                   UUID                                 Owner State
------                   ----                                 ----- -----
Domain-0                 00000000-0000-0000-0000-000000000000 00001 00001
test                     ad8942f2-66a7-707c-765f-abe7ad5b06a9 00001 00002
Storing test
Request to fence: test
test is running locally
Plain TCP request
ipv4_connect: Connecting to client
ipv4_connect: Success; fd = 11
Rebooting domain test...
[[ XML Domain Info ]]
<domain type='xen' id='1'>
  <name>test</name>
  <uuid>ad8942f2-66a7-707c-765f-abe7ad5b06a9</uuid>
  <os>
    <type>linux</type>
    <kernel>/boot/vmlinuz-2.6.18-92.1.22.el5xen</kernel>
    <initrd>/boot/initrd-2.6.18-92.1.22.el5xen-no-scsi.img</initrd>
    <root>/dev/sda5</root>
  </os>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <interface type='bridge'>
      <source bridge='xenbr0'/>
      <target dev='vif1.0'/>
      <mac address='00:16:3E:24:D0:80'/>
      <script path='vif-bridge'/>
    </interface>
    <disk type='block' device='disk'>
      <driver name='phy'/>
      <source dev='sda5'/>
      <target dev='sda5'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='phy'/>
      <source dev='sda6'/>
      <target dev='sda6'/>
    </disk>
    <console tty='/dev/pts/2'/>
  </devices>
</domain>

[[ XML END ]]
Virtual machine is Linux
Unlinkiking os block
[[ XML Domain Info (modified) ]]
<?xml version="1.0"?>
<domain type="xen" id="1">
  <name>test</name>
  <uuid>ad8942f2-66a7-707c-765f-abe7ad5b06a9</uuid>
  <memory>524288</memory>
  <vcpu>1</vcpu>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <interface type="bridge">
      <source bridge="xenbr0"/>
      <target dev="vif1.0"/>
      <mac address="00:16:3E:24:D0:80"/>
      <script path="vif-bridge"/>
    </interface>
    <disk type="block" device="disk">
      <driver name="phy"/>
      <source dev="sda5"/>
      <target dev="sda5"/>
    </disk>
    <disk type="block" device="disk">
      <driver name="phy"/>
      <source dev="sda6"/>
      <target dev="sda6"/>
    </disk>
    <console tty="/dev/pts/2"/>
  </devices>
</domain>

[[ XML END ]]
[REBOOT] Calling virDomainDestroy(0xdbd710)
Domain has been shut off
Calling virDomainCreateLinux()...
libvir: XML error : missing operating system information for test
libvir: Xen Daemon error : XML description for domain is not well formed or invalid

Version-Release number of selected component (if applicable):
cman-2.0.84-2.el5_2.3

How reproducible:
Every time

Steps to Reproduce:
1. Create a trivial 1-node Dom0 cluster with fence_xvmd set to run
2. Create a trivial 2-node DomU cluster
3. Create your fence_xvm keys across all nodes
4. Manually run fence_xvm -H <ArbritrayDomUhostname> on Dom0
  
Actual results:
The DomU is destroyed but not recreated, meaning this has to be done manually

Expected results:
DomU should be destroyed and recreated automatically.

Additional info:

Comment 1 Lon Hohberger 2009-01-15 17:57:00 UTC
So, it looks like this was introduced with the rebase from libvirt 0.2.x to 0.3.x.  The solution is to try both ways:

 * First, try virDomainCreateLinux() assuming the unmodified domain description will work,
 * after that, remove the <os/> block as was previously required and attempt to do it that way.

This is important, but as I have found, not deemed 'critical' since the most important function of fencing is 'off'.  'On' (i.e. the other half of reboot) is not a critical action from a cluster perspective.

Comment 2 Lon Hohberger 2009-01-15 18:02:54 UTC
Created attachment 329116 [details]
Fix

Patch which implements a fix.

Comment 3 Lon Hohberger 2009-01-15 18:04:44 UTC
Created attachment 329117 [details]
Logs

Note that the fix works (the domain is still operational and was restarted).  Furthermore, virDomainCreateLinux() works with the unaltered XML description.  Unfortunately, it appears virDomainCreateLinux() doesn't return a successful return code.

Comment 4 Lon Hohberger 2009-01-15 18:17:40 UTC
Created attachment 329119 [details]
Fixed patch.

Corrected fix.  Logic error.

Comment 5 Lon Hohberger 2009-01-15 18:23:42 UTC
I have been unable to reproduce on libvirt versions going back to 0.1.8 from the RHEL5 channel.

Comment 11 Lon Hohberger 2009-07-22 19:12:54 UTC
Cause: Attempting to reboot a VM using fence_xvm

Consequence: The VM would remain shut off in stead of restarting.

Fix: An issue was addressed preventing correct VM creation.

Result: The VM is now correctly restarted when an administrator wishes for the domain to reboot.

Comment 13 errata-xmlrpc 2009-09-02 11:06:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1341.html