Bug 844378
| Summary: | error messages may not be propagated properly during failed migration | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||||||||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 6.3 | CC: | cwei, dyuan, iheim, jdenemar, jkt, mzhan, rbalakri, weizhan, yeylon, zpeng | ||||||||||||
| Target Milestone: | rc | Keywords: | Upstream | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | libvirt-0.10.2-32.el6 | Doc Type: | Bug Fix | ||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2014-10-14 04:14:03 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 601259 [details]
libvirt logs
I've been already looking at this... is it only for unplug, or also for VM started without disks to begin with? (In reply to comment #5) > is it only for unplug, or also for VM started without disks to begin with? it appears to be related to hot-plug, I started a vm without disks boot from network and I manage to migrate it. Several migration failures can be found in the logs. Most of them were caused by bug 807023, i.e., the disk was in fact not unplugged but libvirt didn't know about it. Thus libvirt removed the disk from domain XML and started qemu on destination without that disk but source qemu still saw the disk attached and sent its status to destination. Once qemu on destination host saw that, it immediately aborted with "Unknown savevm section or instance '0000:00:05.0/virtio-blk' 0" error message. Libvirtd on source detected it during Finish phase and reported "Domain not found: no domain with matching name 'vm'". In the last failure, however, qemu didn't even have a chance to fail because someone explicitly killed the domain using virDomainDestroy. Thus the Prepare phase failed, although the error was not properly propagated: "An error occurred, but the cause is unknown". Since the root cause is already tracked by the other bug, we can use this bug to track the issue with error message propagation. Created attachment 601889 [details]
libvirt client log from source host
Created attachment 601890 [details]
libvirtd log from source host
Created attachment 601891 [details]
libvirtd log from destination host
Created attachment 601892 [details]
/var/log/libvirt/qemu/vm.log from destination host
Time on destination is approximately 50 seconds ahead of source. I can reproduce it with libvirt-0.9.10-21.el6.x86_64 and libvirt-0.10.0-0rc0.el6 Steps: 1. change log level to 1 on src and dst host, restart libvirtd 2. prepare a guest with allocated virtio disk 3. hot unplug the disk 4. migrate and fail 5. check the error info and log This is fixed upstream by v1.2.2-291-gcfa7cea:
commit cfa7ceab7735410c0427136236bf8bad10670816
Author: Jiri Denemark <jdenemar>
Date: Mon Mar 17 11:04:07 2014 +0100
qemu: Return meaningful error when qemu dies early
https://bugzilla.redhat.com/show_bug.cgi?id=844378
When qemu dies early after connecting to its monitor but before we
actually try to read something from the monitor, we would just fail
domain start with useless message:
"An error occurred, but the cause is unknown"
This is because the real error gets reported in a monitor EOF handler
executing within libvirt's event loop.
The fix is to take any error set in qemuMonitor structure and propagate
it into the thread-local error when qemuMonitorClose is called and no
thread-local error is set.
Signed-off-by: Jiri Denemark <jdenemar>
Notes:
To reproduce the issue, just apply the following patch and kill
qemu process while libvirtd is sleeping:
diff --git i/src/qemu/qemu_process.c w/src/qemu/qemu_process.c
index 400625a..3b77ecc 100644
--- i/src/qemu/qemu_process.c
+++ w/src/qemu/qemu_process.c
@@ -1415,6 +1415,9 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int logfd)
if (mon)
ignore_value(qemuMonitorSetDomainLog(mon, logfd));
+ VIR_DEBUG("Sleeping");
+ sleep(5);
+
virObjectLock(vm);
virObjectUnref(vm);
priv->monStart = 0;
Can reproduce this with build:libvirt-0.10.2-29.el6.x86_64 # error: Failed to start domain rhel6.5 error: An error occurred, but the cause is unknown verify with build:libvirt-0.10.2-32.el6.x86_64 step: 1:apply Jiri's reproducer on libvirt 2:rebuild libvirt pkg and install 3:# virsh start rhel6.5 & sleep 3;killall -9 qemu-kvm error: internal error End of file from monitor expect error msg.move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1374.html |
Description of problem: - start vm with 1 virtIO disk pre-allocated - unplug disk - migrate vm from source to destination error (vdsm log): Thread-25957::ERROR::2012-07-30 19:00:46,217::vm::176::vm.Vm::(_recover) vmId=`3f6cb042-2be6-4c81-8429-56e6d803f438`::Domain not found: no domain with matching name 'vm2' Thread-25957::ERROR::2012-07-30 19:00:46,395::vm::240::vm.Vm::(run) vmId=`3f6cb042-2be6-4c81-8429-56e6d803f438`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 223, in run self._startUnderlyingMigration() File "/usr/share/vdsm/libvirtvm.py", line 451, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/libvirtvm.py", line 491, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1104, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: Domain not found: no domain with matching name 'vm2' libvirt error: 2012-07-30 16:07:48.611+0000: 2731: debug : qemuProcessAutoDestroyAdd:4187 : vm=vm2, conn=0x7fef000023f0 2012-07-30 16:07:48.611+0000: 2731: debug : qemuDriverCloseCallbackSet:533 : vm=vm2, uuid=3f6cb042-2be6-4c81-8429-56e6d803f438, conn=0x7fef000023f0, cb=0x48e6f 0 2012-07-30 16:07:48.611+0000: 2731: debug : qemuProcessStart:3662 : Writing domain status to disk 2012-07-30 16:07:48.650+0000: 2731: debug : qemuMigrationPrepareAny:1340 : Received no lockstate 2012-07-30 16:07:48.651+0000: 2731: debug : qemuMigrationBakeCookie:729 : cookielen=394 cookie=<qemu-migration> <name>vm2</name> <uuid>3f6cb042-2be6-4c81-8429-56e6d803f438</uuid> <hostname>nott-vds2.qa.lab.tlv.redhat.com</hostname> <hostuuid>35303738-3635-4742-3830-333334344130</hostuuid> <graphics type='spice' port='5900' listen='0' tlsPort='5901'> <cert info='subject' value='O=qa.lab.tlv.redhat.com,CN=nott-vds2.qa.lab.tlv.redhat.com'/> </graphics> </qemu-migration> 2012-07-30 16:07:48.651+0000: 2731: debug : qemuDomainCleanupAdd:1873 : vm=vm2, cb=0x494500 2012-07-30 16:07:48.651+0000: 2724: debug : virDomainFree:2313 : dom=0x1398020, (VM: name=vm2, uuid=3f6cb042-2be6-4c81-8429-56e6d803f438) 2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7feefc0cc5e0 refs=3 2012-07-30 16:07:49.349+0000: 2724: error : qemuMonitorIO:574 : internal error End of file from monitor 2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorIO:607 : Error on monitor internal error End of file from monitor 2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7feefc0cc5e0 refs=2 2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorIO:630 : Triggering EOF callback 2012-07-30 16:07:49.349+0000: 2724: debug : qemuProcessHandleMonitorEOF:285 : Received EOF on 0x7feefc13bd00 'vm2' 2012-07-30 16:07:49.349+0000: 2724: debug : qemuProcessHandleMonitorEOF:304 : Monitor connection to 'vm2' closed without SHUTDOWN event; assuming the domain cr ashed vm xml: <domain type='kvm' id='4'> <name>vm2</name> <uuid>3f6cb042-2be6-4c81-8429-56e6d803f438</uuid> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <vcpu placement='static'>1</vcpu> <cputune> <shares>1020</shares> </cputune> <sysinfo type='smbios'> <system> <entry name='manufacturer'>Red Hat</entry> <entry name='product'>RHEV Hypervisor</entry> <entry name='version'>6Server-6.3.0.3.el6</entry> <entry name='serial'>38373035-3536-4247-3830-333334343930_78:E7:D1:E4:8C:70</entry> <entry name='uuid'>3f6cb042-2be6-4c81-8429-56e6d803f438</entry> </system> </sysinfo> <os> <type arch='x86_64' machine='rhel6.3.0'>hvm</type> <smbios mode='sysinfo'/> </os> <features> <acpi/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>Conroe</model> <topology sockets='1' cores='1' threads='1'/> </cpu> <clock offset='variable' adjustment='0'> <timer name='rtc' tickpolicy='catchup'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source startupPolicy='optional'/> <target dev='hdc' bus='ide'/> <readonly/> <serial></serial> <alias name='ide0-1-0'/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> <controller type='ide' index='0'> <alias name='ide0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <interface type='bridge'> <mac address='00:1a:4a:23:61:66'/> <source bridge='rhevm'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='virtio' port='0'/> <alias name='console0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channels/vm2.com.redhat.rhevm.vdsm'/> <target type='virtio' name='com.redhat.rhevm.vdsm'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> <input type='mouse' bus='ps2'/> <graphics type='spice' port='5902' tlsPort='5903' autoport='yes' listen='0' keymap='en-us' passwdValidTo='1970-01-01T00:00:01'> <listen type='address' address='0'/> <channel name='main' mode='secure'/> <channel name='display' mode='secure'/> <channel name='inputs' mode='secure'/> <channel name='cursor' mode='secure'/> <channel name='playback' mode='secure'/> <channel name='record' mode='secure'/> </graphics> <video> <model type='qxl' vram='65536' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> <seclabel type='dynamic' model='selinux' relabel='yes'> <label>system_u:system_r:svirt_t:s0:c279,c690</label> <imagelabel>system_u:object_r:svirt_image_t:s0:c279,c690</imagelabel> </seclabel> </domain>