Bug 615941

Summary: Migrate fail with error 'An undefined error has ocurred'.
Product: Red Hat Enterprise Linux 6 Reporter: wangyimiao <yimwang>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: berrange, clalance, dallan, ddumas, dyuan, eblake, ivars.strazdins, jdenemar, llim, veillard, weizhan, xen-maint, yoyzhang, zhpeng
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 647090 696948 698141 (view as bug list) Environment:
Last Closed: 2011-09-22 01:40:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 584077, 670727, 698496    
Bug Blocks: 647090, 696948, 698141    

Description wangyimiao 2010-07-19 10:25:54 UTC
Description of problem:

Migrate fail with error 'An undefined error has ocurred' .

Version-Release number of selected component (if applicable):
-libvirt-0.8.1-15.el6.x86_64
-qemu-img-0.12.1.2-2.96.el6.x86_64
-qemu-kvm-0.12.1.2-2.96.el6.x86_64
-kernel-2.6.32-50.el6.x86_64

How reproducible:
5/5

Steps to Reproduce:
1.1.# iptables -F

2.# setenforce 0

3.# virsh migrate  --live vm1  qemu+ssh://10.66.93.211/system  
root.93.211's password: 
error: internal error unable to execute QEMU command 'migrate': An undefined error has ocurred

4.#virsh -c  qemu+ssh://10.66.93.211/system  
root.93.211's password: 
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # 



4.# tail -f /var/log/messages 
Jul 19 17:16:11 dhcp-93-197 dnsmasq-dhcp[10307]: DHCPDISCOVER(virbr0) 10.66.70.93 52:54:00:c4:d7:00 
Jul 19 17:16:11 dhcp-93-197 dnsmasq-dhcp[10307]: DHCPOFFER(virbr0) 192.168.122.166 52:54:00:c4:d7:00 
Jul 19 17:16:11 dhcp-93-197 dnsmasq-dhcp[10307]: DHCPREQUEST(virbr0) 192.168.122.166 52:54:00:c4:d7:00 
Jul 19 17:16:11 dhcp-93-197 dnsmasq-dhcp[10307]: DHCPACK(virbr0) 192.168.122.166 52:54:00:c4:d7:00 
Jul 19 17:16:11 dhcp-93-197 libvirtd: 17:16:11.817: error : qemuMonitorJSONCheckError:316 : internal error unable to execute QEMU command 'migrate': An undefined error has ocurred
Jul 19 17:16:34 dhcp-93-197 avahi-daemon[9804]: Invalid legacy unicast query packet.
Jul 19 17:16:35 dhcp-93-197 avahi-daemon[9804]: Invalid legacy unicast query packet.
Jul 19 17:16:37 dhcp-93-197 avahi-daemon[9804]: Invalid legacy unicast query packet.
Jul 19 17:16:41 dhcp-93-197 avahi-daemon[9804]: Invalid legacy unicast query packet.
Jul 19 17:16:49 dhcp-93-197 avahi-daemon[9804]: Invalid legacy unicast query packet.
Jul 19 17:17:05 dhcp-93-197 avahi-daemon[9804]: Invalid legacy unicast query packet.
Jul 19 17:17:09 dhcp-93-197 libvirtd: 17:17:09.273: error : qemuMonitorJSONCheckError:316 : internal error unable to execute QEMU command 'migrate': An undefined error has ocurred

Actual results:
Migrate fail with error 'internal error unable to execute QEMU command 'migrate': An undefined error has ocurred' .


Expected results:
Migrate will be fine.

Additional info:
I will keep machine ''10.66.93.211'' live.

Use virt-manager to migrate :
 1.Right click 'vm1' and select migrate.
 2.Click the "Advance" and check 'address'.
 3.Input valid address "10.66.93.211".
Actual results:
 Migrate successfully.

Comment 2 RHEL Program Management 2010-07-19 10:57:40 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Daniel Veillard 2010-07-23 15:32:09 UTC
I logged on 10.66.93.211 I could see the errors in /var/log/messages

Jul 19 08:34:28 dhcp-93-211 libvirtd: 08:34:28.016: error : qemuMonitorJSONCheckError:316 : internal error unable to execute QEMU command 'migrate': An undefined error has ocurred

There was nothing suspicious in /var/log/libvirt/qemu/vm1.log, it's launched
with:

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name vm1 -uuid 99df8f8c-ee31-3848-d220-80a3147b444f -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=utc -boot c -drive file=/mnt/wang/yimwang/RHEL-Server-6-64-virtio.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2 -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:c9:67:21,bus=pci.0,addr=0x3 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

that looks normal to me.

but based on the report, the source machine i.e. where vm1 was running is on
10.66.93.197 and that machine is not available to check, so I could not try to
reproduce this,

Daniel

Comment 4 Daniel Veillard 2010-07-23 15:50:31 UTC
Do you still have 10.66.93.197 state ? Or can you reproduce this from
another source machine and giving us access to that machine ? The
informations are likely to be saved on the source logs

Daniel

Comment 5 wangyimiao 2010-07-26 02:36:15 UTC
Hi 'DV',

 I do the seem the operations on "10.66.93.211"(Define a VM as 'vm1' and migrate it to "10.66.93.211"), so I just keep machine ''10.66.93.211'' live.
 And now I will keep '10.66.93.197' and "10.66.93.211" are available. '10.66.93.197' is not maine,if it unavailable,plesse tell me by 'bz'or 'IRC'.

Comment 6 wangyimiao 2010-07-26 02:42:52 UTC
Hi 'DV',

 I do the same operate on "10.66.93.211" (Define a VM as 'vm1' and migrate it to "10.66.93.211"), so I just keep machine ''10.66.93.211'' live.
 And now I will keep '10.66.93.197' and "10.66.93.211" are available. '10.66.93.197'is not mine,if it unavailable,please tell me by 'bz' or 'IRC'.
  

     Thanks!

Comment 8 Justin Clift 2010-07-28 16:07:47 UTC
Tried to replicate the problem here using the same kernel, qemu, and libvirt versions, but no luck. :(

Yimiao, can you please rebuild these hosts (and make the sure problem still occurs), so I can access them remotely?

Comment 9 Justin Clift 2010-07-29 09:10:07 UTC
dyuan was kind enough to give access to the hosts, and the problem still remains.

I've not been able to get the migration to work successfully using virt-manager though.  What settings did you use for that?

Comment 10 dyuan 2010-07-30 02:51:36 UTC
As the 'Additional info' in original description, should select 'New host' and also fill the same ip address in 'Advanced options'-> 'Address', then migration work successfully.
If don't fill 'Advanced options'-> 'Address', will get the error after click the 'Migrate'.
I'm not sure what the advanced settings do additionally, because can't find any difference from virt-manager.log and '/var/log/libvirt/qemu/guest.log'.

Comment 12 Justin Clift 2010-08-03 11:08:29 UTC
This bug is being painful to find out what the cause is.

I have been debugging this for several days, without yet establishing the clear reason for the failure.

Qemu is reporting an unknown failure when the migration is being performed with virsh.

However, the migration succeeds when virt-manager (on one host) is used following dyuan's steps. (tested a few minutes ago)

Comment 13 Justin Clift 2010-08-03 11:20:52 UTC
The bug itself can be reproduced when attempting to migrate from 10.66.92.154 (source host) to 10.66.93.205 (destination host), using the steps Yimiao has given in this BZ.

The migration DOES work when using virt-manager, following the steps Yimiao gave in the "Additional Info" part of this BZ (and repeated by Dyuan in response to my question).

Additionally, the migration DOES succeed when using virsh in the opposite direction.

I'm still investigating.  The cause is proving time consuming and tricky to find out though. :( (so far)

Comment 14 Justin Clift 2010-08-03 11:28:22 UTC
weizhang, that's a different bug.  Please file a BZ (if the bug isn't already reported). :)

Comment 15 Justin Clift 2010-08-03 18:42:35 UTC
The cause of this problem has turned out to be name resolution (thanks DV).

Adding an entry for the destination host to /etc/hosts, on the source server, allows the migration to work.

  10.66.93.205    dhcp-93-205.nay.redhat.com

I'll discuss with the rest of the libvirt team if this is something we should write a check for, so people aren't caught by this in future.

Comment 16 Chris Lalancette 2010-08-03 18:47:46 UTC
Huh, very odd.  I thought we should have caught that with our FQDN check in qemudDomainMigratePrepare2:

        /* Get hostname */
        if ((hostname = virGetHostname(NULL)) == NULL)
            goto cleanup;

        if (STRPREFIX(hostname, "localhost")) {
            qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                            _("hostname on destination resolved to localhost, but migration requires an FQDN"));
            goto cleanup;
        }

As you suggest, it's probably worth seeing what the output from virGetHostname() was, and to add a check to reject whatever bogus value it is reporting.

Chris Lalancette

Comment 18 Dave Allan 2010-08-09 18:57:31 UTC
Since the problem here is really that it's difficult to troubleshoot migration when name resolution is broken, I'm going to move this BZ to 6.1.

Comment 19 Justin Clift 2010-08-10 10:49:18 UTC
Initial patches submitted upstream to solve this problem:

  http://www.redhat.com/archives/libvir-list/2010-August/msg00138.html
  http://www.redhat.com/archives/libvir-list/2010-August/msg00139.html

Comment 21 Osier Yang 2011-01-19 05:28:43 UTC
To get the destination QEMU uri, we execute hostname() on destination host to get the hostname, and then use it to form QEMU uri, (e.g. tcp://10.66.70.83).

so, if the destination host is mis-configured, it will give us a illed
address which would not actually be addressed by source host, then the migrate will fail.

But we can't do further more on libvirt side (guessing the hostname on destination side will make things more complex and confused), as qemu discard the errors for migrate, (actually all of the errors related to getaddrinfo(3)/connect(2) failing?), If qemu is able to report any error, we can then have better diagnose
log.

So, create a depedant bug against qemu-kvm.

https://bugzilla.redhat.com/show_bug.cgi?id=670727

Comment 23 Osier Yang 2011-03-16 07:24:46 UTC
*** Bug 680162 has been marked as a duplicate of this bug. ***

Comment 28 Dave Allan 2011-06-10 02:28:06 UTC
*** Bug 681109 has been marked as a duplicate of this bug. ***

Comment 29 Dave Allan 2011-06-21 01:53:24 UTC
*** Bug 618562 has been marked as a duplicate of this bug. ***

Comment 31 weizhang 2011-07-21 03:44:02 UTC
test on 
kernel-2.6.32-166.el6.x86_64
qemu-kvm-0.12.1.2-2.169.el6.x86_64
libvirt-0.9.3-7.el6.x86_64

without adding hostname and ip on /etc/hosts
still report:
error: internal error unable to execute QEMU command 'migrate': An undefined error has ocurred

Comment 34 zhpeng 2011-08-09 10:29:47 UTC
test on
virt-manager-0.9.0-5.el6.x86_64
libvirt-0.9.4-1.el6.x86_64
qemu-kvm-0.12.1.2-2.175.el6.x86_64
kernel-2.6.32-175.el6.x86_64

without adding hostname and ip on /etc/hosts

report:
Unable to migrate guest: out of memory

Comment 35 weizhang 2011-08-09 10:57:45 UTC
(In reply to comment #34)
> test on
> virt-manager-0.9.0-5.el6.x86_64
> libvirt-0.9.4-1.el6.x86_64
> qemu-kvm-0.12.1.2-2.175.el6.x86_64
> kernel-2.6.32-175.el6.x86_64
> 
> without adding hostname and ip on /etc/hosts
> 
> report:
> Unable to migrate guest: out of memory

The difference from this bug is that the hostname here without dnsdomainname. Do we need to file a new bug?

Comment 36 Osier Yang 2011-08-09 11:06:57 UTC
It's unlikely the testing box is really out of memory, but please confirm that first, if it's really out of memory, that's not bug, otherwise please file a new
bug with the debug log. Thanks.

Comment 37 Jiri Denemark 2011-08-10 07:56:44 UTC
(In reply to comment #35)
> (In reply to comment #34)
> > test on
> > virt-manager-0.9.0-5.el6.x86_64
> > libvirt-0.9.4-1.el6.x86_64
> > qemu-kvm-0.12.1.2-2.175.el6.x86_64
> > kernel-2.6.32-175.el6.x86_64
> > 
> > without adding hostname and ip on /etc/hosts
> > 
> > report:
> > Unable to migrate guest: out of memory
> 
> The difference from this bug is that the hostname here without dnsdomainname.
> Do we need to file a new bug?

You are most likely hitting a bug fixed by upstream commit 63e4af45f274adf1821498970dfa3902caf1bc8c. But we don't (AFAIK) have a BZ for that yet. Please, file a new BZ with the steps needed for reproducing the bug.

Comment 38 Osier Yang 2011-09-22 01:40:44 UTC
This is fixed after we changed to use qemu fd: protocol for migration, see BZ https://bugzilla.redhat.com/show_bug.cgi?id=720269, so I close this as a DUPLICATE
with 720269.

*** This bug has been marked as a duplicate of bug 720269 ***