Bug 844378

Summary: error messages may not be propagated properly during failed migration
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.3CC: cwei, dyuan, iheim, jdenemar, jkt, mzhan, rbalakri, weizhan, yeylon, zpeng
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-32.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 04:14:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirt logs
none
libvirt client log from source host
none
libvirtd log from source host
none
libvirtd log from destination host
none
/var/log/libvirt/qemu/vm.log from destination host none

Description Haim 2012-07-30 12:51:17 UTC
Description of problem:

- start vm with 1 virtIO disk pre-allocated
- unplug disk
- migrate vm from source to destination

error (vdsm log):

Thread-25957::ERROR::2012-07-30 19:00:46,217::vm::176::vm.Vm::(_recover) vmId=`3f6cb042-2be6-4c81-8429-56e6d803f438`::Domain not found: no domain with matching
 name 'vm2'
Thread-25957::ERROR::2012-07-30 19:00:46,395::vm::240::vm.Vm::(run) vmId=`3f6cb042-2be6-4c81-8429-56e6d803f438`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 223, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/libvirtvm.py", line 451, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1104, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: Domain not found: no domain with matching name 'vm2'

libvirt error: 

2012-07-30 16:07:48.611+0000: 2731: debug : qemuProcessAutoDestroyAdd:4187 : vm=vm2, conn=0x7fef000023f0
2012-07-30 16:07:48.611+0000: 2731: debug : qemuDriverCloseCallbackSet:533 : vm=vm2, uuid=3f6cb042-2be6-4c81-8429-56e6d803f438, conn=0x7fef000023f0, cb=0x48e6f
0
2012-07-30 16:07:48.611+0000: 2731: debug : qemuProcessStart:3662 : Writing domain status to disk
2012-07-30 16:07:48.650+0000: 2731: debug : qemuMigrationPrepareAny:1340 : Received no lockstate
2012-07-30 16:07:48.651+0000: 2731: debug : qemuMigrationBakeCookie:729 : cookielen=394 cookie=<qemu-migration>
  <name>vm2</name>
  <uuid>3f6cb042-2be6-4c81-8429-56e6d803f438</uuid>
  <hostname>nott-vds2.qa.lab.tlv.redhat.com</hostname>
  <hostuuid>35303738-3635-4742-3830-333334344130</hostuuid>
  <graphics type='spice' port='5900' listen='0' tlsPort='5901'>
    <cert info='subject' value='O=qa.lab.tlv.redhat.com,CN=nott-vds2.qa.lab.tlv.redhat.com'/>
  </graphics>
</qemu-migration>

2012-07-30 16:07:48.651+0000: 2731: debug : qemuDomainCleanupAdd:1873 : vm=vm2, cb=0x494500
2012-07-30 16:07:48.651+0000: 2724: debug : virDomainFree:2313 : dom=0x1398020, (VM: name=vm2, uuid=3f6cb042-2be6-4c81-8429-56e6d803f438)
2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorRef:201 : QEMU_MONITOR_REF: mon=0x7feefc0cc5e0 refs=3
2012-07-30 16:07:49.349+0000: 2724: error : qemuMonitorIO:574 : internal error End of file from monitor
2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorIO:607 : Error on monitor internal error End of file from monitor
2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorUnref:210 : QEMU_MONITOR_UNREF: mon=0x7feefc0cc5e0 refs=2
2012-07-30 16:07:49.349+0000: 2724: debug : qemuMonitorIO:630 : Triggering EOF callback
2012-07-30 16:07:49.349+0000: 2724: debug : qemuProcessHandleMonitorEOF:285 : Received EOF on 0x7feefc13bd00 'vm2'
2012-07-30 16:07:49.349+0000: 2724: debug : qemuProcessHandleMonitorEOF:304 : Monitor connection to 'vm2' closed without SHUTDOWN event; assuming the domain cr
ashed

vm xml:


<domain type='kvm' id='4'>
  <name>vm2</name>
  <uuid>3f6cb042-2be6-4c81-8429-56e6d803f438</uuid>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Red Hat</entry>
      <entry name='product'>RHEV Hypervisor</entry>
      <entry name='version'>6Server-6.3.0.3.el6</entry>
      <entry name='serial'>38373035-3536-4247-3830-333334343930_78:E7:D1:E4:8C:70</entry>
      <entry name='uuid'>3f6cb042-2be6-4c81-8429-56e6d803f438</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='rhel6.3.0'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Conroe</model>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
  <clock offset='variable' adjustment='0'>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source startupPolicy='optional'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <serial></serial>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:1a:4a:23:61:66'/>
      <source bridge='rhevm'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='virtio' port='0'/>
      <alias name='console0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/vm2.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <graphics type='spice' port='5902' tlsPort='5903' autoport='yes' listen='0' keymap='en-us' passwdValidTo='1970-01-01T00:00:01'>
      <listen type='address' address='0'/>
      <channel name='main' mode='secure'/>
      <channel name='display' mode='secure'/>
      <channel name='inputs' mode='secure'/>
      <channel name='cursor' mode='secure'/>
      <channel name='playback' mode='secure'/>
      <channel name='record' mode='secure'/>
    </graphics>
    <video>
      <model type='qxl' vram='65536' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c279,c690</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c279,c690</imagelabel>
  </seclabel>
</domain>

Comment 2 Haim 2012-07-30 12:53:15 UTC
Created attachment 601259 [details]
libvirt logs

Comment 3 Jiri Denemark 2012-07-30 14:06:03 UTC
I've been already looking at this...

Comment 5 Itamar Heim 2012-07-31 11:21:46 UTC
is it only for unplug, or also for VM started without disks to begin with?

Comment 6 Haim 2012-07-31 11:28:27 UTC
(In reply to comment #5)
> is it only for unplug, or also for VM started without disks to begin with?

it appears to be related to hot-plug, I started a vm without disks boot from network and I manage to migrate it.

Comment 7 Jiri Denemark 2012-08-01 14:07:18 UTC
Several migration failures can be found in the logs. Most of them were caused
by bug 807023, i.e., the disk was in fact not unplugged but libvirt didn't
know about it. Thus libvirt removed the disk from domain XML and started qemu
on destination without that disk but source qemu still saw the disk attached
and sent its status to destination. Once qemu on destination host saw that, it
immediately aborted with

    "Unknown savevm section or instance '0000:00:05.0/virtio-blk' 0"

error message. Libvirtd on source detected it during Finish phase and reported

    "Domain not found: no domain with matching name 'vm'".


In the last failure, however, qemu didn't even have a chance to fail because
someone explicitly killed the domain using virDomainDestroy. Thus the Prepare
phase failed, although the error was not properly propagated:

    "An error occurred, but the cause is unknown".

Since the root cause is already tracked by the other bug, we can use this bug
to track the issue with error message propagation.

Comment 8 Jiri Denemark 2012-08-02 07:58:41 UTC
Created attachment 601889 [details]
libvirt client log from source host

Comment 9 Jiri Denemark 2012-08-02 07:59:09 UTC
Created attachment 601890 [details]
libvirtd log from source host

Comment 10 Jiri Denemark 2012-08-02 07:59:38 UTC
Created attachment 601891 [details]
libvirtd log from destination host

Comment 11 Jiri Denemark 2012-08-02 08:00:09 UTC
Created attachment 601892 [details]
/var/log/libvirt/qemu/vm.log from destination host

Comment 12 Jiri Denemark 2012-08-02 08:02:34 UTC
Time on destination is approximately 50 seconds ahead of source.

Comment 13 zhpeng 2012-08-06 03:25:52 UTC
I can reproduce it with libvirt-0.9.10-21.el6.x86_64 and libvirt-0.10.0-0rc0.el6
Steps:
1. change log level to 1 on src and dst host, restart libvirtd
2. prepare a guest with allocated virtio disk
3. hot unplug the disk
4. migrate and fail
5. check the error info and log

Comment 20 Jiri Denemark 2014-03-20 22:17:09 UTC
This is fixed upstream by v1.2.2-291-gcfa7cea:

commit cfa7ceab7735410c0427136236bf8bad10670816
Author: Jiri Denemark <jdenemar>
Date:   Mon Mar 17 11:04:07 2014 +0100

    qemu: Return meaningful error when qemu dies early
    
    https://bugzilla.redhat.com/show_bug.cgi?id=844378
    
    When qemu dies early after connecting to its monitor but before we
    actually try to read something from the monitor, we would just fail
    domain start with useless message:
    
        "An error occurred, but the cause is unknown"
    
    This is because the real error gets reported in a monitor EOF handler
    executing within libvirt's event loop.
    
    The fix is to take any error set in qemuMonitor structure and propagate
    it into the thread-local error when qemuMonitorClose is called and no
    thread-local error is set.
    
    Signed-off-by: Jiri Denemark <jdenemar>

Notes:
    To reproduce the issue, just apply the following patch and kill
    qemu process while libvirtd is sleeping:
    
    diff --git i/src/qemu/qemu_process.c w/src/qemu/qemu_process.c
    index 400625a..3b77ecc 100644
    --- i/src/qemu/qemu_process.c
    +++ w/src/qemu/qemu_process.c
    @@ -1415,6 +1415,9 @@ qemuConnectMonitor(virQEMUDriverPtr driver, virDomainObjPtr vm, int logfd)
         if (mon)
             ignore_value(qemuMonitorSetDomainLog(mon, logfd));
    
    +    VIR_DEBUG("Sleeping");
    +    sleep(5);
    +
         virObjectLock(vm);
         virObjectUnref(vm);
         priv->monStart = 0;

Comment 24 zhe peng 2014-04-15 09:02:08 UTC
Can reproduce this with build:libvirt-0.10.2-29.el6.x86_64
# error: Failed to start domain rhel6.5
error: An error occurred, but the cause is unknown
verify with build:libvirt-0.10.2-32.el6.x86_64

step:
1:apply Jiri's reproducer on libvirt
2:rebuild libvirt pkg and install
3:# virsh start rhel6.5 & sleep 3;killall -9 qemu-kvm
error: internal error End of file from monitor

expect error msg.move to verified.

Comment 26 errata-xmlrpc 2014-10-14 04:14:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1374.html