Bug 540715

Summary: unable to migrate guest: Unknown failure
Product: [Fedora] Fedora Reporter: Gary Scarborough <gscarborough>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: berrange, clalance, crobinso, drsmooth, hbrock, itamar, jforbes, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-26 20:35:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gary Scarborough 2009-11-24 00:13:19 UTC
Description of problem:

I get an error "Unable to migrate guest" when trying to migrate a Windows XP 32 bit virtual machine from one machine to another.  Guest then kernel panics instead of failing gracefully.


Version-Release number of selected component (if applicable):

virt-manager-0.8.0-8.fc12.noarch
python-virtinst-0.500.0-5.fc12.noarch
libvirt-0.7.4-1.fc12.x86_64
libvirt-python-0.7.4-1.fc12.x86_64
virt-viewer-0.2.0-1.fc12.x86_64

How reproducible:

Always.

Steps to Reproduce:
1.  Create Windows XP guest
2.  Connect remotely over ssh 
3.  Migrate virtual machine to the other connection.
  
Actual results:

Failure to migrate and crashed guest OS.

Expected results:

Successful migration.

Additional info:

Error message displayed by virt-manager:

Unable to migrate guest:
 Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/engine.py", line 657, in _async_migrate
    vm.migrate(dstconn)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1555, in migrate
    self.vm.migrate(destconn.vmm, flags, None, None, 0)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 428, in migrate
    if ret is None:raise libvirtError('virDomainMigrate() failed', dom=self)
libvirtError: Unknown failure

Comment 1 Chris Lalancette 2009-11-24 07:44:11 UTC
(In reply to comment #0)
> Description of problem:
> 
> I get an error "Unable to migrate guest" when trying to migrate a Windows XP 32
> bit virtual machine from one machine to another.  Guest then kernel panics
> instead of failing gracefully.

Sigh.  Can you attach the output of "cat /etc/hosts" here?  There's a bug in the anaconda shipped with F-12 that has caused this to happen, so I want to check if you are running into the same problem.

Chris Lalancette

Comment 2 Gary Scarborough 2009-11-24 21:30:22 UTC
Here is the output of my /etc/hosts file:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Comment 3 Cole Robinson 2009-12-01 23:43:53 UTC
Reassigning to libvirt, since my guess is this isn't virt-manager specific. Gary, can you reproduce using virsh?

There are known error reporting issues WRT migration: it's possible that a variety of legitimate errors are being lost before returned to the user. You can possible see some interesting info by manually running 'libvirtd' on both the source and destination hosts, and watching for any error console output.

Comment 4 Gary Scarborough 2009-12-03 03:10:26 UTC
I started a virsh console on both machines, and started the guest Winxp machine on one.  Wasn't sure of the syntax of the migrate command so I may have done this wrong:

migrate --live wintest qemu+ssh://10.100.100.97/system

I received the RSA bit, and entered the password and got this back:

error: Unknown failure

After that, the Winxp guest blue screened.  If I have the command wrong, please post what I should have typed.

Comment 5 Christopher Hunt 2010-02-24 00:10:50 UTC
I may be running into the same issue.  I've attempted a live migration between two F12 hosts, once using virt-manager and once using virsh (virsh migrate --live example2 qemu+ssh://virt02/system )  Like Mr. Scarborough, the virsh command did go so far as to accept my password, and the destination host does log the vnic coming up on br0, then going back down:

Feb 23 15:59:30 virt02 kernel: device vnic.example2 entered promiscuous mode
Feb 23 15:59:30 virt02 kernel: br0: port 7(vnic.example2) entering learning state
Feb 23 15:59:30 virt02 kernel: br0: port 7(vnic.example2) entering disabled state
Feb 23 15:59:30 virt02 kernel: device vnic.example2 left promiscuous mode
Feb 23 15:59:30 virt02 kernel: br0: port 7(vnic.example2) entering disabled state

I didn't get much info from the virsh command, but virt-manager gave me:

Unable to migrate guest:
Traceback (most recent call last):
 File "/usr/share/virt-manager/virtManager/migrate.py", line 453, in _async_migrate
   vm.migrate(dstconn, migrate_uri, rate, live, secure)
 File "/usr/share/virt-manager/virtManager/domain.py", line 230, in migrate
   self._backend.migrate(destconn.vmm, flags, newname, interface, rate)
 File "/usr/lib64/python2.6/site-packages/libvirt.py", line 384, in migrate
   if ret is None:raise libvirtError('virDomainMigrate() failed', dom=self)
libvirtError: Unknown failure

Here's what my hosts file looks like:
[root@virt05 ~]# cat /etc/hosts
# hostname virt05 added to /etc/hosts by anaconda
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 virt05
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 virt05

x.x.x.201 virt01 virt01.example.com
x.x.x.202 virt02 virt02.example.com
x.x.x.224 virt05 virt05.example.com

-Chris

Comment 6 Chris Lalancette 2010-02-24 14:01:49 UTC
(In reply to comment #5)
> Here's what my hosts file looks like:
> [root@virt05 ~]# cat /etc/hosts
> # hostname virt05 added to /etc/hosts by anaconda
> 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
> virt05
> ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
> virt05

Yeah, the above two lines in your /etc/hosts is what's causing the problem.  The good news is that Fedora 13 isn't silly, and doesn't resolve your hostname to localhost.  The other good news is that we now have a patch in libvirt 0.7.7 that gives you a bit more information than "Unknown error".  The bad news is that the installer in Fedora 12 *does* put those entries in place, and will forevermore.  I guess the best solution at present is to just remove the "virt05" entry at the end of those two lines; that should fix up the issue for you.

Chris Lalancette

Comment 7 Christopher Hunt 2010-02-24 21:07:39 UTC
Mr. Lalancette,
Thanks for the pointer.  I did remove the 'virt05' bit from each line in the hosts file, but alas that had no effect. 

Interestingly, I also checked the hosts file of the target host, virt02 and it was missing the last bit. Perhaps they'd already been modified by another sysamdin...



[root@virt05 ~]# uname -a
Linux virt05 2.6.31.9-174.fc12.x86_64 #1 SMP Mon Dec 21 05:33:33 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@virt05 ~]# cat /etc/issue
Fedora release 12 (Constantine)
Kernel \r on an \m (\l)

[root@virt02 ~]# uname -a
Linux virt02.reachone.com 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@virt02 ~]# cat /etc/issue
Fedora release 12 (Constantine)
Kernel \r on an \m (\l) 

  Any other ideas?

Comment 8 Cole Robinson 2010-02-24 21:59:32 UTC
does turning off the firewall on both machines make any difference?

Comment 9 Christopher Hunt 2010-02-24 22:05:12 UTC
unfortunately, i'm not at liberty to do that :-)

Comment 10 Cole Robinson 2010-02-24 23:38:14 UTC
Not even for testing purposes? Have you opened the necessary ports in your firewall to allow for migration? With qemu libvirt, this is the port range 49152-49216.

Trying this with virt-manager might help, it provides a nicer interface to the multitude of options, so you can mix and match to see if something gets the job done.

Comment 11 Christopher Hunt 2010-02-25 19:35:42 UTC
Cole, 
After thinking about it for 2 seconds I realize that of course I can do that.  It worked like a charm!  Thanks for the input.  The "live" migration did take about 2 mins until i could ssh to the guest from outside, but that might be an ARP issue, frankly I'm not clear on that.  I'll do some more research.  

So what's the fix here?  update the default iptables rules?

Comment 12 Cole Robinson 2010-02-26 16:10:59 UTC
There's a few things we should fix:

1) Make 'Unknown failure' into an actually useful error (done upstream)
2) Somehow show an explicit error about 'Migration connection to port 1234 refused'. This may need to be done at the qemu level.
3) Document the qemu migration port range somewhere obvious: maybe a libvirt wiki page about migration and how to debug it if it goes wrong.

We shouldn't just open those ports by default, this should really be opt in, so I think just better error messages and documentation is the best way to go.

Comment 13 Chris Lalancette 2010-02-26 16:49:44 UTC
(In reply to comment #12)
> There's a few things we should fix:
> 
> 1) Make 'Unknown failure' into an actually useful error (done upstream)

Yes.

> 2) Somehow show an explicit error about 'Migration connection to port 1234
> refused'. This may need to be done at the qemu level.

Yes, for the "simple" migration case, this does need to be done at the qemu level, since qemu is the one making the connection.  For the tunnelled migration that libvirt can also do, it *should* report correct errors already.

> 3) Document the qemu migration port range somewhere obvious: maybe a libvirt
> wiki page about migration and how to debug it if it goes wrong.
> 
> We shouldn't just open those ports by default, this should really be opt in, so
> I think just better error messages and documentation is the best way to go.    

Yeah, agreed.

Chris Lalancette

Comment 14 Cole Robinson 2010-05-26 20:35:07 UTC
There are quite a few bugs tracking this problem, duping this one to 499750. I'll be pushing an f12 build which fixes the libvirt error reporting, as well as adding an entry to the libvirt FAQ to document common migration issues and error messages.

*** This bug has been marked as a duplicate of bug 499750 ***