Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 602869

Summary: Live migration broken for HVM (Windows) instances with PV drivers
Product: Red Hat Enterprise Linux 5 Reporter: Bill Braswell <bbraswel>
Component: xenAssignee: Michal Novotny <minovotn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 5.2CC: areis, james.brown, leiwang, minovotn, mjenner, mrezanin, pbonzini, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: xen-3.0.3-115.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 22:22:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 514499    
Attachments:
Description Flags
Patch to pass vif type to save handling none

Description Bill Braswell 2010-06-10 22:15:04 UTC
Xen live migration broken for HVM (Windows) instances with PV drivers. The script that runs on the target side that sets up networking assumes the guest has been using emulated network device even if the instance actually has PV network device setup.

This has already been ran past Paolo Bonzini, who said to have a BZ opened on this.

08:18:42 >> bonzini<< godfather: seems wrong indeed...
08:18:48 >> bonzini<< godfather: open bz for xen


Comments from the customer:
The migrated instance never comes up.  The problem is that the
arguments that xend launches 'qemu-dm' are incorrect.   Our network
config is PV w/routing.  There are NO bridges in out configuration.

Yet 'xend' insists on specifying  network params to create the tap/bridge
config.  The problem seems to be that the configuration transmitted
from the source host to the dest host does not contain information
that the vif type is 'front'.   The 'xend' code for then says, "Oh,
I must be ioemu!" and sets up incorrect args.    The pertinent
code appears to be:
--
/usr/lib64/python2.4/site-packages/xen/xend/image.py
--
   338             if name == 'vif':
   339                 type = sxp.child_value(info, 'type')
   340                 if type is None:
   341                     type = "ioemu"
   342                 if type != 'ioemu':
   343                     continue
   344                 nics += 1
   345                 mac = sxp.child_value(info, 'mac')
   346                 if mac == None:
   347                     mac = randomMAC()
   348                 bridge = sxp.child_value(info, 'bridge', 'xenbr0')
   349                 model = sxp.child_value(info, 'model', 'rtl8139')
   350                 ret.append("-net")
   351                 ret.append("nic,vlan=%d,macaddr=%s,model=%s" %
   352                            (nics, mac, model))
   353                 ret.append("-net")
   354                 ret.append("tap,vlan=%d,bridge=%s" % (nics, bridge))
   355         return ret
--
type is *definitely* 'None'.  I believe that the code on the source host does
not correctly specify the 'front' type when it is sending the configuration
across to the target host.



Reproducer available at

The following lab is set up

Caveat 1: Please try to not reboot 10.65.208.84. It has a few other tests going on.

Caveat 2: The winxp is verified working from both xen Dom0. However its an FV.

1. NFS server at 10.65.210.68
 ssh: root/redhat
 exports /var/lib/xen/images
 which has a Windows XP vm called winxp
 
2. Xen Dom0 at 10.65.208.84
 ssh: root/redhat
 vnc: 10.65.208.84:2/redhat
 mounts /var/lib/xen/images from nfs server

3. Xen Dom0 at 10.65.210.208
 ssh: root/redhat
 vnc: 10.65.210.208:1/redhat
 mounts /var/lib/xen/images from nfs server

Comment 6 Paolo Bonzini 2010-07-28 12:05:41 UTC
I'm passing this to the Xen component, since it looks like qemu-dm is incorrectly invoked.

Comment 7 Michal Novotny 2010-07-29 11:14:52 UTC
Bill,
you're writing about lab machines at:

1. NFS server at 10.65.210.68
 ssh: root/redhat
 exports /var/lib/xen/images
 which has a Windows XP vm called winxp

2. Xen Dom0 at 10.65.208.84
 ssh: root/redhat
 vnc: 10.65.208.84:2/redhat
 mounts /var/lib/xen/images from nfs server

3. Xen Dom0 at 10.65.210.208
 ssh: root/redhat
 vnc: 10.65.210.208:1/redhat
 mounts /var/lib/xen/images from nfs server    

Unfortunately although I can log into machine 2 (.84) I can't access the image for winxp. I can't even ping the machines 1 (.68) and 3 (.208).

Can you give me access to environment where can I reproduce it?

Thanks,
Michal

Comment 9 Michal Novotny 2010-08-03 15:29:32 UTC
Well, I did testing on this one and it doesn't really seem to have some issues. Here are the qemu-dm arguments.

Guest started: /usr/lib64/xen/bin/qemu-dm -d 10 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 127.0.0.1:10 -vncunused

After migration to host B (non-live):  /usr/lib64/xen/bin/qemu-dm -d 18 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 0.0.0.0:18 -vncunused -loadvm /var/lib/xen/qemu-save-18.img

After migration to host A again (live): /usr/lib64/xen/bin/qemu-dm -d 19 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 0.0.0.0:19 -vncunused -loadvm /var/lib/xen/qemu-save-19.img

I did try pinging the guest all the time and it was working fine. From the guest I tried to ping some machine and it was working fine as well so I can't really reproduce it when the guest is using PV drivers.

Bill, could the customer please retest using the packages from:

http://people.redhat.com/mrezanin/xen/

Thanks,
Michal

Comment 11 Paolo Bonzini 2010-08-03 15:40:13 UTC
You probably cannot see the bug because you're using type=ioemu.  With type=netfront (anything but ioemu is the same) and bridges you should already be able to see the wrong qemu-dm command line.

I suggest you do try with type=netfront and with the bridge, since that's much easier to setup, but if the bug does not reproduce that way please try without the bridge as well.

Comment 12 Michal Novotny 2010-08-03 15:56:43 UTC
Well, I did try setting up type=netfront and those are the results.

Guest started with: /usr/lib64/xen/bin/qemu-dm -d 12 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -vnc 127.0.0.1:12 -vncunused

After migration: /usr/lib64/xen/bin/qemu-dm -d 20 -m 1024 -boot c -serial pty -vcpus 1 -acpi -k en-us -domain-name WinXP-32fv -net nic,vlan=1,macaddr=00:16:36:61:5d:bf,model=rtl8139 -net tap,vlan=1,bridge=xenbr0 -vnc 0.0.0.0:20 -vncunused -loadvm /var/lib/xen/qemu-save-20.img

So this is not right. I'll work on this one. The problem seems to be in the save handling since only:

(device (vif (backend 0) (script vif-bridge) (bridge xenbr0) (mac 00:16:36:61:5d:bf) (vifname vif4.0)))

is being saved to the save file and there's no evidence of type so I'm going to work on this one.

Michal

Comment 13 Michal Novotny 2010-08-03 16:39:18 UTC
Created attachment 436325 [details]
Patch to pass vif type to save handling

Well, the issue is that when it's not set to something it falls to None on save and on restore the None value is being treated as ioemu which is bad. Therefore migrations and restores from the older versions of Xen works fine when this patch is applied on images saved using this patch applied or when migrating from newer versions (with this patch applied) to the older version of Xen.

The attached patch fixes this. It's a backport of upstream c/s 15972.

Michal

Comment 17 Lei Wang 2010-08-26 01:39:09 UTC
Test details:
vif type=netfront
WinXP guest with PV drivers

Could reproduce this bug with xen-3.0.3-114.el5.

And verify this bug with xen-3.0.3-115.el5.
Details as follows:

Guest started in host A with: /usr/lib64/xen/bin/qemu-dm -d 11 -m 512 -boot c -vcpus 4 -acpi -domain-name vm1 -vnc 0.0.0.0:11 -vncunused


After migration(live) to host B: /usr/lib64/xen/bin/qemu-dm -d 4 -m 512 -boot c -vcpus 4 -acpi -domain-name vm1 -vnc 0.0.0.0:4 -vncunused -loadvm /var/lib/xen/qemu-save-4.img


After migration back(non-live) to host A: /usr/lib64/xen/bin/qemu-dm -d 12 -m 512 -boot c -vcpus 4 -acpi -domain-name vm1 -vnc 0.0.0.0:12 -vncunused -loadvm /var/lib/xen/qemu-save-12.img


So move the bug to VERIFIED.

Comment 20 errata-xmlrpc 2011-01-13 22:22:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0031.html