Bug 750786
Summary: | reboot failed after live migration on windows 2008 R2 (SP1) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Maurits van de Lande <m.vandelande> | ||||
Component: | qemu-kvm | Assignee: | Amos Kong <akong> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.1 | CC: | acathrow, ailan, bsarathy, juzhang, michen, mkenneth, m.vandelande, rhod, tburke, virt-maint | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: |
Dual AMD opteron 6128 server
Two network bonds in mode 4 (802.3ad) One bond is used for KVM networking, the other for host access,drbd and clustering.
|
|||||
Last Closed: | 2012-01-11 22:54:22 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Does it happens consistently? What do you mean fail? Crash? Any output? About the storage, if you use GFS2, why need for drbd? Can you detail it a bit more? What happens if NFS is used? >Does it happens consistently? Yes it does. >What do you mean fail? Crash? Any output? The VM does not reboot it just stops running. The VM "shuts down" instead of restarting. It doesn't appear to crash. When I start the VM again I get no warnings. Before migrating the VM just restarts as expected after a reboot. I do get a crash when I try to change a virtio-win network adapter property like "Offload Tx IP checksum" AFTER a live migration. Before a migration a can change this property without the VM crashing. >if you use GFS2, why need for drbd? I use drbd to synchronize the block devices used for GFS2. On each server I have a partition /dev/sdb1 this partition is used to create a replicated block device between the two cluster nodes. on top of sdb1 a block device /dev/drbd0 is created. /dev/drbd0 is a PV for cluster LVM. a LV is used for GFS2. see: http://www.drbd.org/users-guide/ch-gfs.html >What happens if NFS is used? I don't know, I don't use NFS. I'll start a test without a virtio network adapter but with a e1000 adapter. What can I do to help? > I'll start a test without a virtio network adapter but with a e1000 adapter.
When I use <model type='e1000'/> instead of <model type='virtio'/> then the VM doesn't crash after a migration when I change the "TCP checksum offload" property.
Also a reboot works as expected, the VM doesn't shut down.
The problem appears to be "virtio" related.
>I have done some more testing. and the problem appears to be the vhost_net kernel module. I found the following: http://www.redhat.com/archives/libvir-list/2011-March/msg00310.html > <dt><code>name</code></dt> > <dd> > The optional <code>name</code> attribute forces which type of > backend driver to use. The value can be either 'qemu' (a > user-space backend) or 'vhost' (a kernel backend, which > requires the vhost module to be provided by the kernel); an > attempt to require the vhost driver without kernel support > will be rejected. If this attribute is not present, then the > domain defaults to 'vhost' if present, but silently falls back > to 'qemu' without error. > <span class="since">Since 0.8.8 (QEMU and KVM only)</span> > </dd> > <dt><code>txmode</code></dt> When I start the VM with the qemu userspace network driver and not the vhost kernel driver then live migration works fine. So I added the following to the "interface" xml section in the VM configuration file. <driver name='qemu'/> modinfo vhost_net shows version 0.0.1 Is there a newer (fixed) version of vhost_net available? Can you please try using nfs or iscsi instead of GFS/drbd? Let's try to isolate it. There are potential issues that can come from the shared storage. (In reply to comment #6) > >I have done some more testing. and the problem appears to be the vhost_net kernel module. > > I found the following: > http://www.redhat.com/archives/libvir-list/2011-March/msg00310.html > > > <dt><code>name</code></dt> > > <dd> > > The optional <code>name</code> attribute forces which type of > > backend driver to use. The value can be either 'qemu' (a > > user-space backend) or 'vhost' (a kernel backend, which > > requires the vhost module to be provided by the kernel); an > > attempt to require the vhost driver without kernel support > > will be rejected. If this attribute is not present, then the > > domain defaults to 'vhost' if present, but silently falls back > > to 'qemu' without error. > > <span class="since">Since 0.8.8 (QEMU and KVM only)</span> > > </dd> > > <dt><code>txmode</code></dt> > > When I start the VM with the qemu userspace network driver and not the vhost > kernel driver then live migration works fine. > So I added the following to the "interface" xml section in the VM configuration > file. > > <driver name='qemu'/> > > modinfo vhost_net shows version 0.0.1 > > Is there a newer (fixed) version of vhost_net available? [root@f16 ~]# uname -r 3.2.0-rc1+ [root@f16 ~]# modinfo vhost_net |grep version version: 0.0.1 vhost_net version hasn't changed in upstream, but there are many changes of vhost_net in upstream and rhel kernel. Could you help to test those two scenarios? NFS & Virtio_net & Vhost_net off NFS & Virtio_net & Vhost_net on Could you help to provide qemu commandline, qemu output and other error log? >Could you help to test those two scenarios? > NFS & Virtio_net & Vhost_net off > NFS & Virtio_net & Vhost_net on > >Could you help to provide qemu commandline, qemu output and other error log? I'll have to setup a NFS server first. I'll try to perform those test in week 51. I used this guide to setup an nfs server : http://aaronwalrath.wordpress.com/2011/03/18/configure-nfs-server-v3-and-v4-on-scientific-linux-6-and-red-hat-enterprise-linux-rhel-6/ When I try to start a VM using nfs I get the following error [root@vmhost1a libvirt]# virsh create nfstest.xml error: Failed to create domain from nfstest.xml error: unable to set user and group to '107:107' on '/var/lib/libvirt/images/W2K8R2DC-disk0': Invalid argument nfs is mounted on /var/lib/libvirt/images It resembles: https://bugzilla.redhat.com/show_bug.cgi?id=709454 (In reply to comment #11) > I used this guide to setup an nfs server : > http://aaronwalrath.wordpress.com/2011/03/18/configure-nfs-server-v3-and-v4-on-scientific-linux-6-and-red-hat-enterprise-linux-rhel-6/ > > When I try to start a VM using nfs I get the following error > > [root@vmhost1a libvirt]# virsh create nfstest.xml please attach your xml file. > error: Failed to create domain from nfstest.xml > error: unable to set user and group to '107:107' on > '/var/lib/libvirt/images/W2K8R2DC-disk0': Invalid argument > nfs is mounted on /var/lib/libvirt/images > > It resembles: https://bugzilla.redhat.com/show_bug.cgi?id=709454 it's libvirt bug. I am not clear about your test evn, could you help to test by qemu cmdline directly? otherwise, we always be blocked by other problem. >it's libvirt bug.
>could you help to test by qemu cmdline
>directly? otherwise, we always be blocked by other problem.
Oké, I have always used virsh. I'll try qemu instead.
I'll also upgrade the system to EL6.2 soon, it includes a newer libvirt release.
I tested the live migration with vhost_net driver enabled on el6.2 (CentOS 6.2). This time it all worked perfectly. Even the live migration was noticably faster then before. It looks like this bug is solved. |
Created attachment 531319 [details] sample VM config file Description of problem: After a performing a live migration of a Windows Server 2008 R2 (SP1) a reboot of that VM failes. When using a windows Server 2003 R2 (SP2) VM, this problem does not occur. Version-Release number of selected component (if applicable): qemu-kvm: version 0.12.1.2 release 2.160.el6_1.8 libvirt : version 0.8.7 release 18.el6_1.1 kernel : version 2.6.32 release 131.17.1.el6 virtio-win on Windows 2003 R2 version 51.62.102.200 (10-8-2011) virtio-win on windows 2008 R2 version 61.62.102.200 (10-8-2011) or version 6.0.209.605 (20-9-2010) How reproducible: always Steps to Reproduce: 1. start VM on host1 2. perform a live migration to host2 3. Open virt-manager, logon to Windows on this VM, Reboot the VM from witin Windows. Actual results: The VM shuts off. Expected results: The VM should reboot. Additional info: The cluster is a two node cluster. With a GFS2 filesystem and drbd83