Bug 844378
Summary: | error messages may not be propagated properly during failed migration | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||||||||||
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 6.3 | CC: | cwei, dyuan, iheim, jdenemar, jkt, mzhan, rbalakri, weizhan, yeylon, zpeng | ||||||||||||
Target Milestone: | rc | Keywords: | Upstream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | libvirt-0.10.2-32.el6 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2014-10-14 04:14:03 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Haim
2012-07-30 12:51:17 UTC
Created attachment 601259 [details]
libvirt logs
I've been already looking at this... is it only for unplug, or also for VM started without disks to begin with? (In reply to comment #5) > is it only for unplug, or also for VM started without disks to begin with? it appears to be related to hot-plug, I started a vm without disks boot from network and I manage to migrate it. Several migration failures can be found in the logs. Most of them were caused by bug 807023, i.e., the disk was in fact not unplugged but libvirt didn't know about it. Thus libvirt removed the disk from domain XML and started qemu on destination without that disk but source qemu still saw the disk attached and sent its status to destination. Once qemu on destination host saw that, it immediately aborted with "Unknown savevm section or instance '0000:00:05.0/virtio-blk' 0" error message. Libvirtd on source detected it during Finish phase and reported "Domain not found: no domain with matching name 'vm'". In the last failure, however, qemu didn't even have a chance to fail because someone explicitly killed the domain using virDomainDestroy. Thus the Prepare phase failed, although the error was not properly propagated: "An error occurred, but the cause is unknown". Since the root cause is already tracked by the other bug, we can use this bug to track the issue with error message propagation. Created attachment 601889 [details]
libvirt client log from source host
Created attachment 601890 [details]
libvirtd log from source host
Created attachment 601891 [details]
libvirtd log from destination host
Created attachment 601892 [details]
/var/log/libvirt/qemu/vm.log from destination host
Time on destination is approximately 50 seconds ahead of source. I can reproduce it with libvirt-0.9.10-21.el6.x86_64 and libvirt-0.10.0-0rc0.el6 Steps: 1. change log level to 1 on src and dst host, restart libvirtd 2. prepare a guest with allocated virtio disk 3. hot unplug the disk 4. migrate and fail 5. check the error info and log This is fixed upstream by v1.2.2-291-gcfa7cea: commit cfa7ceab7735410c0427136236bf8bad10670816 Author: Jiri Denemark <jdenemar> Date: Mon Mar 17 11:04:07 2014 +0100 qemu: Return meaningful error when qemu dies early https://bugzilla.redhat.com/show_bug.cgi?id=844378 When qemu dies early after connecting to its monitor but before we actually try to read something from the monitor, we would just fail domain start with useless message: "An error occurred, but the cause is unknown" This is because the real error gets reported in a monitor EOF handler executing within libvirt's event loop. The fix is to take any error set in qemuMonitor structure and propagate it into the thread-local error when qemuMonitorClose is called and no thread-local error is set. Signed-off-by: Jiri Denemark <jdenemar> Notes: To reproduce the issue, just apply the following patch and kill qemu process while libvirtd is sleeping: diff --git i/src/qemu/qemu_process.c w/src/qemu/qemu_process.c index 400625a..3b77ecc 100644 --- i/src/qemu/qemu_process.c +++ w/src/qemu/qemu_process.c @@ -1415,6 +1415,9 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int logfd) if (mon) ignore_value(qemuMonitorSetDomainLog(mon, logfd)); + VIR_DEBUG("Sleeping"); + sleep(5); + virObjectLock(vm); virObjectUnref(vm); priv->monStart = 0; Can reproduce this with build:libvirt-0.10.2-29.el6.x86_64 # error: Failed to start domain rhel6.5 error: An error occurred, but the cause is unknown verify with build:libvirt-0.10.2-32.el6.x86_64 step: 1:apply Jiri's reproducer on libvirt 2:rebuild libvirt pkg and install 3:# virsh start rhel6.5 & sleep 3;killall -9 qemu-kvm error: internal error End of file from monitor expect error msg.move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html |