(Filing this by proxy for mike.hinz) +++ This bug was initially created as a clone of Bug #499704 +++ Description of problem: Attempting to migrate a running or stopped VM fails in all cases. Version-Release number of selected component (if applicable): virsh # version Compiled against library: libvir 0.6.2 Using library: libvir 0.6.2 Using API: QEMU 0.6.2 Running hypervisor: QEMU 0.10.1 [root@vmh2 Download]# uname -a Linux vmh2 2.6.29.1-111.fc11.x86_64 #1 SMP Fri Apr 24 10:57:09 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Always Steps to Reproduce: 1. Connect to the local machine's hypervisor as follows and see the local machines: virsh # connect qemu:///system virsh # list --all Id Name State ---------------------------------- 3 vm1 running - vm2 shut off 2. Verify connectivity to the hypervisor of the remote target system as follows: virsh # connect qemu+tcp://vmh3/system virsh # uri qemu+tcp://vmh3/system virsh # list --all Id Name State ---------------------------------- 4 vm1-vmh3 running 3. Attempt the migration as follows: virsh # connect qemu:///system virsh # migrate vm2 qemu+tcp://vmh3/system error: Unknown failure Above shows first successful connect to the local hypervisor and then failure to migrate the remote hypervisor even though step 2 above clearly shows that we can 100% connect to the remote hypevisor. We can demonstrate this same failure with the transport method of tcp, ssh, or tls. Actual results: The operation fails and throws errors as follows: virsh # migrate --live vm1 qemu+tcp://vmh3/system error: Unknown failure Expected results: The VM migration should start and succeed for either stopped or running VMs Additional info: This is in a lab environment with all firewalls and selinux disabled on all physical machines. Connectivity always succeed via tcp method, ssh method, and tls method. However, migration always fails regardless of the connectivity method attempted. [root@vmh2 CA]# rpm -q kvm python-virtinst virt-viewer virt-manager package kvm is not installed python-virtinst-0.400.3-7.fc11.noarch virt-viewer-0.0.3-4.fc11.x86_64 virt-manager-0.7.0-4.fc11.x86_64 [root@vmh2 CA]# [root@vmh2 CA]# uname -a Linux vmh2 2.6.29.1-111.fc11.x86_64 #1 SMP Fri Apr 24 10:57:09 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux --- Additional comment from mike.hinz on 2009-05-07 14:57:11 EDT --- Created an attachment (id=342915) cpu info from hardware Added as per virtualization bug reporting wiki. --- Additional comment from mike.hinz on 2009-05-07 14:58:24 EDT --- Created an attachment (id=342918) lspci info Added as per request from virtualization bug reporting wiki --- Additional comment from mike.hinz on 2009-05-07 15:00:10 EDT --- Created an attachment (id=342919) virsh capabilities output Output of virsh capabilities as per virtualization bug reporting wiki.
Created attachment 342946 [details] client.log
Created attachment 342947 [details] daemon.log
Created attachment 342948 [details] daemon-strace.log
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
The 'daemon.log' file from comment #2, shows that the VM was succesfully started up on the destination. The next thing it shows is libvirt requesting an abort of the migration attempt: 16:52:41.768: debug : virDomainMigrateFinish2:3043 : dconn=0x7fcc24000a30, dname=vm2, cookie=(nil), cookielen=0, uri=tcp:vmh3:49152, flags=0, retcode=-1 16:52:41.768: debug : qemudShutdownVMDaemon:1518 : Shutting down VM 'vm2' The 'client.log' file from comment #1 seems to shw the libvirt client is operating normally. So, the only answer left is that something must have gone wrong in the source host's libvirtd daemon during migration. Thus I think we need to get a libvirt debugging log session from the source libvirtd daemon, so we can see what's happening with the virDomainMigratePerform2 method.
To hopefully move this along further, I've done some testing. Based on some of the chat this afternoon on IRC, the below is the command and the contents of the log file: /var/log/vibvirt/qemu/vm2clone.log. virsh # list --all Id Name State ---------------------------------- - vm-full-gold shut off - vm1clone shut off - vm2clone shut off virsh # migrate vm2clone qemu+tcp://192.168.50.20/system error: Unknown failure Then the log file on the source physical host gives this: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin /usr/bin/qemu-kvm -S -M pc -m 2000 -smp 1 -name vm2clone -uuid 080c4e4a-572b-ff8a-93d8-1bed40c2d093 -monitor pty -pidfile /var/run/libvirt/qemu//vm2clone.pid -boot c -drive file=,if=ide,media=cdrom,index=2 -drive file=/mnt/nfs-store/vm2clone.img,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:09:1b:b8,vlan=0 -net tap,fd=19,vlan=0 -serial pty -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:0 char device redirected to /dev/pts/2 char device redirected to /dev/pts/3 I'll continue what the logs show on the destination physical host in the next commetn.
To follow the above comment, please see the following from the logs on the physical destination machine during the migration attempt: From /var/log/libvirt/qemu/vm2clong.log LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin /usr/bin/qemu-kvm -S -M pc -m 2000 -smp 1 -name vm2clone -uuid 080c4e4a-572b-ff8a-93d8-1bed40c2d093 -monitor pty -pidfile /var/run/libvirt/qemu//vm2clone.pid -boot c -drive file=,if=ide,media=cdrom,index=2 -drive file=/mnt/nfs-store/vm2clone.img,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:09:1b:b8,vlan=0 -net tap,fd=22,vlan=0 -serial pty -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:0 -incoming tcp:0.0.0.0:49171 char device redirected to /dev/pts/2 char device redirected to /dev/pts/3 During the irc session danpb had me create the log file /var/log/libvirt/daemon.log by editing libvirtd.conf on the target machine and setting: log_filters="1:qemu" log_outputs="1:file:/var/log/libvirt/daemon.log" The output of that log file on the destination physical host is as follows: 15:31:31.938: info : Received unexpected signal 17 15:31:31.943: info : Received unexpected signal 17 15:31:32.994: debug : qemudDomainSetMemoryBalloon:2530 : vm2clone: balloon reply: balloon 2000 15:31:32.995: debug : qemudShutdownVMDaemon:1526 : Shutting down VM 'vm2clone' 15:31:32.996: error : invalid domain pointer in no domain with matching uuid 15:31:32.996: debug : qemudDispatchClientFailure:1407 : Deregistering to relay remote events This is with Fedora 11, fully updated to the latest. I had an earlier error relating to the sound device in the VM, but I've removed that device. Also, this VM utilizes the default NAT'd networking, but I've also tried migration using a host with a bridge setup. That also fails. Please let me know what additional info may be needed. Regards. Mike
I am experiencing the same issue with an up to date F11 system. Please let me know what information I can provide.
After the latest updates to Fedora 11, I am now able to successfully migrate a running VM using virt-manager; but I cannot migrate the same VM if it is turned off.
(In reply to comment #9) > After the latest updates to Fedora 11, I am now able to successfully migrate a > running VM using virt-manager; but I cannot migrate the same VM if it is turned > off. Right, this makes sense. In general, migration is supposed to be "live"; that is, clients of the VM don't notice that it's been moved from one physical place to another. Therefore, libvirt doesn't really have the concept of migrating a "turned off" VM. All this would really do would be to copy over the XML from one host to another, since there's no memory to copy, and the disk still has to be shared. If you are doing that, then it's probably a better idea to just write a simple script to connect to both the source and destination, do "dumpxml" on the source, and then do "define" on the destination. This might also be an interesting feature request for a new virsh command, but it won't require any new API's. Chris Lalancette
(In reply to comment #7) > This is with Fedora 11, fully updated to the latest. I had an earlier error > relating to the sound device in the VM, but I've removed that device. Also, > this VM utilizes the default NAT'd networking, but I've also tried migration > using a host with a bridge setup. That also fails. > > Please let me know what additional info may be needed. Mike, I've recently tracked down a similar problem in the Fedora 12 packages. In that case, it was due to the fact that "hostname" on the destination machine didn't return something reasonable. Now, we should definitely do better in libvirt than "unknown error"; I have a patch pending to fix that. However, can you try doing the following: 1) Open up port 49152 in the firewall on the destination (if not already done) 2) On the destination host, make sure that the "hostname" command returns something reasonable (like vmh3.example.org), and that "nslookup vmh3.example.org" also resolves properly. 3) On the source, run: # virsh migrate --live vm1 qemu+tcp://vmh3/system tcp://vmh3:49152 And then let us know the results of all of this? Thanks, Chris Lalancette
Mike: any chance you can try Chris's suggestions? Thanks
Chris: Is it absolutely necessary that 'nslookup' resolves the hostname? I'm running into what appears to be the same issue covered with this bug on Debian and libvirt-0.7.0. Each virt host uses /etc/hosts to resolve the other, thus nslookup won't actually resolve since it does not use /etc/hosts.
Same problem with : Client 1 - Fedora 12 # rpm -q libvirt qemu-kvm glusterfs-client libvirt-0.7.1-15.fc12.x86_64 qemu-kvm-0.11.0-13.fc12.x86_64 glusterfs-client-3.0.3-1.fc11.x86_64 Client 2 - Fedora 12 # rpm -q libvirt qemu-kvm glusterfs-client libvirt-0.7.1-15.fc12.x86_64 qemu-kvm-0.11.0-13.fc12.x86_64 glusterfs-client-3.0.3-1.fc11.x86_64 Error on migrate : # virsh migrate --live gfstest qemu+ssh://x6270-b5.sys.intra/system error: Unknown failure Error in messages is : "libvirtd: 16:50:01.437: error : qemudDomainMigratePerform:7292 : operation failed: migrate failed: info migrate#012Migration status: failed#015#012" A manual migration (xml copy and image already visible) works perfectly.
Ok, after noticing the libvirt shipping in F12 is dated Sept 2009 I upgraded from the F13 branch to the latest 0.7.7. Improved log messages now give me: Source: "error: operation failed: Migration unexpectedly failed" Destination: Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.062: info : qemudDispatchServer:1369 : Turn off polkit auth for privileged client 4716 Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.108: info : qemuSecurityDACSetOwnership:40 : Setting DAC context on '/var/lib/libvirt/images/gfstest-disk0' to '0:0' Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.115: info : qemudDispatchSignalEvent:390 : Received unexpected signal 17 Mar 30 12:02:07 x6270-b5 kernel: device vnet0 entered promiscuous mode Mar 30 12:02:07 x6270-b5 kernel: virbr0: port 2(vnet0) entering learning state Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.119: info : qemudDispatchSignalEvent:390 : Received unexpected signal 17 Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.148: info : udevGetDeviceProperty:116 : udev reports device 'vnet0' does not have property 'DRIVER' Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.148: info : udevGetDeviceProperty:116 : udev reports device 'vnet0' does not have property 'PCI_CLASS' Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.148: info : udevSetParent:1222 : Could not find udev parent for device with sysfs path '/sys/devices/virtual/net/vnet0' Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.344: info : qemuSecurityDACRestoreSecurityFileLabel:87 : Restoring DAC context on '/var/lib/libvirt/images/gfstest-disk0' Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.345: info : qemuSecurityDACSetOwnership:40 : Setting DAC context on '/var/lib/libvirt/images/gfstest-disk0' to '0:0' Mar 30 12:02:07 x6270-b5 kernel: virbr0: port 2(vnet0) entering disabled state Mar 30 12:02:07 x6270-b5 kernel: device vnet0 left promiscuous mode Mar 30 12:02:07 x6270-b5 kernel: virbr0: port 2(vnet0) entering disabled state Mar 30 12:02:07 x6270-b5 libvirtd: 12:02:07.417: info : udevRemoveOneDevice:1202 : Failed to find device to remove that has udev name '/sys/devices/virtual/net/vnet0' However no 'errors' in there, just some 'info' messages.
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I'm seeing this in F12/F13 so perhaps a version change is needed.
(In reply to comment #17) > I'm seeing this in F12/F13 so perhaps a version change is needed. Hm, are you still seeing the "Unknown failure", even with F-13? Our error reporting should be much improved in F-13 libvirt, so I would expect a different error if you are still having problems. Chris Lalancette
F-11 is pretty old at this point, and the migration error reporting fixes are non-trivial, so aren't safe to backport. Moving this bug to F12. I will be building a new F12 libvirt package in a few days which should improve error reporting here.
*** Bug 540715 has been marked as a duplicate of this bug. ***
FYI, I've filed a qemu bug about improved migration error reporting; even when the 'unknown error' issue is fixed, qemu doesn't give us much more info. https://bugzilla.redhat.com/show_bug.cgi?id=596506
*** Bug 562017 has been marked as a duplicate of this bug. ***
*** Bug 582111 has been marked as a duplicate of this bug. ***
libvirt-0.7.1-18.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/libvirt-0.7.1-18.fc12
libvirt-0.7.1-18.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update libvirt'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/libvirt-0.7.1-18.fc12
I've also updated the libvirt FAQ with info about common migration errors like this 'Unknown failure' http://wiki.libvirt.org/page/FAQ
libvirt-0.7.1-18.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.
I'm having the exact same issues on RHEL 5.5. Looking into anything that would hurt DNS resolution on the server's hostname, I found that the hostname was typo'ed in /etc/hosts. Fixing /etc/hosts resolved this problem for me on RHEL 5.5.