Bug 602869 - Live migration broken for HVM (Windows) instances with PV drivers
Live migration broken for HVM (Windows) instances with PV drivers
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen (Show other bugs)
5.2
All Linux
low Severity medium
: rc
: ---
Assigned To: Michal Novotny
Virtualization Bugs
:
Depends On:
Blocks: 514499
  Show dependency treegraph
 
Reported: 2010-06-10 18:15 EDT by Bill Braswell
Modified: 2016-04-26 09:25 EDT (History)
8 users (show)

See Also:
Fixed In Version: xen-3.0.3-115.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-01-13 17:22:06 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to pass vif type to save handling (1.65 KB, patch)
2010-08-03 12:39 EDT, Michal Novotny
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0031 normal SHIPPED_LIVE xen bug fix and enhancement update 2011-01-12 10:59:24 EST

  None (edit)
Description Bill Braswell 2010-06-10 18:15:04 EDT
Xen live migration broken for HVM (Windows) instances with PV drivers. The script that runs on the target side that sets up networking assumes the guest has been using emulated network device even if the instance actually has PV network device setup.

This has already been ran past Paolo Bonzini, who said to have a BZ opened on this.

08:18:42 >> bonzini<< godfather: seems wrong indeed...
08:18:48 >> bonzini<< godfather: open bz for xen


Comments from the customer:
The migrated instance never comes up.  The problem is that the
arguments that xend launches 'qemu-dm' are incorrect.   Our network
config is PV w/routing.  There are NO bridges in out configuration.

Yet 'xend' insists on specifying  network params to create the tap/bridge
config.  The problem seems to be that the configuration transmitted
from the source host to the dest host does not contain information
that the vif type is 'front'.   The 'xend' code for then says, "Oh,
I must be ioemu!" and sets up incorrect args.    The pertinent
code appears to be:
--
/usr/lib64/python2.4/site-packages/xen/xend/image.py
--
   338             if name == 'vif':
   339                 type = sxp.child_value(info, 'type')
   340                 if type is None:
   341                     type = "ioemu"
   342                 if type != 'ioemu':
   343                     continue
   344                 nics += 1
   345                 mac = sxp.child_value(info, 'mac')
   346                 if mac == None:
   347                     mac = randomMAC()
   348                 bridge = sxp.child_value(info, 'bridge', 'xenbr0')
   349                 model = sxp.child_value(info, 'model', 'rtl8139')
   350                 ret.append("-net")
   351                 ret.append("nic,vlan=%d,macaddr=%s,model=%s" %
   352                            (nics, mac, model))
   353                 ret.append("-net")
   354                 ret.append("tap,vlan=%d,bridge=%s" % (nics, bridge))
   355         return ret
--
type is *definitely* 'None'.  I believe that the code on the source host does
not correctly specify the 'front' type when it is sending the configuration
across to the target host.



Reproducer available at

The following lab is set up

Caveat 1: Please try to not reboot 10.65.208.84. It has a few other tests going on.

Caveat 2: The winxp is verified working from both xen Dom0. However its an FV.

1. NFS server at 10.65.210.68
 ssh: root/redhat
 exports /var/lib/xen/images
 which has a Windows XP vm called winxp
 
2. Xen Dom0 at 10.65.208.84
 ssh: root/redhat
 vnc: 10.65.208.84:2/redhat
 mounts /var/lib/xen/images from nfs server

3. Xen Dom0 at 10.65.210.208
 ssh: root/redhat
 vnc: 10.65.210.208:1/redhat
 mounts /var/lib/xen/images from nfs server
Comment 6 Paolo Bonzini 2010-07-28 08:05:41 EDT
I'm passing this to the Xen component, since it looks like qemu-dm is incorrectly invoked.
Comment 7 Michal Novotny 2010-07-29 07:14:52 EDT
Bill,
you're writing about lab machines at:

1. NFS server at 10.65.210.68
 ssh: root/redhat
 exports /var/lib/xen/images
 which has a Windows XP vm called winxp

2. Xen Dom0 at 10.65.208.84
 ssh: root/redhat
 vnc: 10.65.208.84:2/redhat
 mounts /var/lib/xen/images from nfs server

3. Xen Dom0 at 10.65.210.208
 ssh: root/redhat
 vnc: 10.65.210.208:1/redhat
 mounts /var/lib/xen/images from nfs server    

Unfortunately although I can log into machine 2 (.84) I can't access the image for winxp. I can't even ping the machines 1 (.68) and 3 (.208).

Can you give me access to environment where can I reproduce it?

Thanks,
Michal
Comment 9 Michal Novotny 2010-08-03 11:29:32 EDT
Well, I did testing on this one and it doesn't really seem to have some issues. Here are the qemu-dm arguments.

Guest started: /usr/lib64/xen/bin/qemu-dm -d 10 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 127.0.0.1:10 -vncunused

After migration to host B (non-live):  /usr/lib64/xen/bin/qemu-dm -d 18 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 0.0.0.0:18 -vncunused -loadvm /var/lib/xen/qemu-save-18.img

After migration to host A again (live): /usr/lib64/xen/bin/qemu-dm -d 19 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 0.0.0.0:19 -vncunused -loadvm /var/lib/xen/qemu-save-19.img

I did try pinging the guest all the time and it was working fine. From the guest I tried to ping some machine and it was working fine as well so I can't really reproduce it when the guest is using PV drivers.

Bill, could the customer please retest using the packages from:

http://people.redhat.com/mrezanin/xen/

Thanks,
Michal
Comment 11 Paolo Bonzini 2010-08-03 11:40:13 EDT
You probably cannot see the bug because you're using type=ioemu.  With type=netfront (anything but ioemu is the same) and bridges you should already be able to see the wrong qemu-dm command line.

I suggest you do try with type=netfront and with the bridge, since that's much easier to setup, but if the bug does not reproduce that way please try without the bridge as well.
Comment 12 Michal Novotny 2010-08-03 11:56:43 EDT
Well, I did try setting up type=netfront and those are the results.

Guest started with: /usr/lib64/xen/bin/qemu-dm -d 12 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -vnc 127.0.0.1:12 -vncunused

After migration: /usr/lib64/xen/bin/qemu-dm -d 20 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 0.0.0.0:20 -vncunused -loadvm /var/lib/xen/qemu-save-20.img

So this is not right. I'll work on this one. The problem seems to be in the save handling since only:

(device (vif (backend 0) (script vif-bridge) (bridge xenbr0) (mac 00:16:36:61:5d:bf) (vifname vif4.0)))

is being saved to the save file and there's no evidence of type so I'm going to work on this one.

Michal
Comment 13 Michal Novotny 2010-08-03 12:39:18 EDT
Created attachment 436325 [details]
Patch to pass vif type to save handling

Well, the issue is that when it's not set to something it falls to None on save and on restore the None value is being treated as ioemu which is bad. Therefore migrations and restores from the older versions of Xen works fine when this patch is applied on images saved using this patch applied or when migrating from newer versions (with this patch applied) to the older version of Xen.

The attached patch fixes this. It's a backport of upstream c/s 15972.

Michal
Comment 17 Lei Wang 2010-08-25 21:39:09 EDT
Test details:
vif type=netfront
WinXP guest with PV drivers

Could reproduce this bug with xen-3.0.3-114.el5.

And verify this bug with xen-3.0.3-115.el5.
Details as follows:

Guest started in host A with: /usr/lib64/xen/bin/qemu-dm -d 11 -m 512 -boot c -vcpus 4 -acpi -domain-name vm1 -vnc 0.0.0.0:11 -vncunused


After migration(live) to host B: /usr/lib64/xen/bin/qemu-dm -d 4 -m 512 -boot c -vcpus 4 -acpi -domain-name vm1 -vnc 0.0.0.0:4 -vncunused -loadvm /var/lib/xen/qemu-save-4.img


After migration back(non-live) to host A: /usr/lib64/xen/bin/qemu-dm -d 12 -m 512 -boot c -vcpus 4 -acpi -domain-name vm1 -vnc 0.0.0.0:12 -vncunused -loadvm /var/lib/xen/qemu-save-12.img


So move the bug to VERIFIED.
Comment 20 errata-xmlrpc 2011-01-13 17:22:06 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0031.html

Note You need to log in before you can comment on or make changes to this bug.