Description of problem: Host with HE VM stuck in preparing to maintenance status. The VM doesn't actually migrating away. In /var/log/messages Jan 14 13:49:30 ocelot02 libvirtd: 2019-01-14 11:49:30.608+0000: 101924: error : qemuMigrationCheckJobStatus:1457 : operation failed: migration job: unexpectedly failed The full scenario is upgrade of RHEV, from 4.1 to 4.2. After upgrading the engine, from the webUI (HE VM+SPM) upgrade of host_mixed_1(ocelot01) is successfully done. I moved host_mixed_3 into maintenance to try and reproduce a bug(1629454). The HE VM+SPM are on host_mixed_2(ocelot02). Now, I tried to upgrade it from the web UI as I did to host_mixed_1. Version-Release number of selected component (if applicable): ovirt-engine-4.2.8.2-0.1.el7ev.noarch Source host(ocelot02): vdsm-4.19.51-1.el7ev.x86_64 libvirt-libs-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-network-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-secret-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-config-nwfilter-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-logical-3.9.0-14.el7_5.8.x86_64 libvirt-python-3.9.0-1.el7.x86_64 libvirt-daemon-driver-nwfilter-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-iscsi-3.9.0-14.el7_5.8.x86_64 libvirt-client-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-qemu-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-rbd-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-disk-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-interface-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-core-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-mpath-3.9.0-14.el7_5.8.x86_64 libvirt-lock-sanlock-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-storage-scsi-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-driver-nodedev-3.9.0-14.el7_5.8.x86_64 libvirt-daemon-kvm-3.9.0-14.el7_5.8.x86_64 qemu-kvm-tools-rhev-2.10.0-21.el7_5.7.x86_64 qemu-kvm-common-rhev-2.10.0-21.el7_5.7.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch qemu-img-rhev-2.10.0-21.el7_5.7.x86_64 libvirt-daemon-driver-qemu-3.9.0-14.el7_5.8.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.7.x86_64 Destination host(ocelot01): vdsm-4.20.46-1.el7ev.x86_64 libvirt-daemon-config-nwfilter-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-iscsi-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-interface-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-lxc-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-disk-4.5.0-10.el7_6.3.x86_64 libvirt-lock-sanlock-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-core-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-config-network-4.5.0-10.el7_6.3.x86_64 libvirt-python-4.5.0-1.el7.x86_64 libvirt-daemon-driver-nodedev-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-mpath-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-4.5.0-10.el7_6.3.x86_64 libvirt-bash-completion-4.5.0-10.el7_6.3.x86_64 libvirt-libs-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-nwfilter-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-scsi-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-kvm-4.5.0-10.el7_6.3.x86_64 libvirt-client-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-qemu-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-secret-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-rbd-4.5.0-10.el7_6.3.x86_64 libvirt-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-network-4.5.0-10.el7_6.3.x86_64 libvirt-daemon-driver-storage-logical-4.5.0-10.el7_6.3.x86_64 qemu-kvm-rhev-2.12.0-18.el7_6.1.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7_4.1.noarch qemu-kvm-tools-rhev-2.12.0-18.el7_6.1.x86_64 qemu-kvm-common-rhev-2.12.0-18.el7_6.1.x86_64 libvirt-daemon-driver-qemu-4.5.0-10.el7_6.3.x86_64 qemu-img-rhev-2.12.0-18.el7_6.1.x86_64 How reproducible: I tried the upgrade procedure just once. But each time I redo it on this environment(activate the host and re-try to upgrade) it had the same results. Steps to Reproduce: 1. Deploy SHE environment with RHEV 4.1(2 hosts). 2. Upgrade the engine to 4.2. 3. Upgrade host1 from the web UI. 4. Upgrade host2 from the web UI. Actual results: Upgrade of host2 failed, host stuck in preparing to maintenance due to unsuccessful HE VM migration. Expected results: HE VM should migrate to host1, host2 should move into maintenance mode, upgrade succeed. Additional info: As a simple workaround, re-activate the host, migrate the HE VM manually(invoking HE VM migration without doing it by changing the host mode succeed).
Created attachment 1520555 [details] libvirt logs
Unfortunately, the important libvirt logs files do not seem to be produced at the time you saw the error. There's no "unexpectedly failed" message in them and while the message you quoted from /var/log/messages was reported at 2019-01-14 11:49:30.608+0000, the log files start at 2019-01-14 13:11:??.???+0000, which is almost 1.5 hours later. Anyway, this kind of error is usually reported if the destination QEMU dies during migration, so /var/log/libvirt/qemu/$VM.log from the destination host may contain some hints. One of the hosts gives us 2019-01-14 11:24:07.405+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.8 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-09-04-07:27:24, x86-019.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-21.el7_5.7), hostname: ocelot02.qa.lab.tlv.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-IBRS,vme=on,f16c=on,rdrand=on,hypervisor=on,arat=on,xsaveopt=on,abm=on -m 16384 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 3074151e-b38f-461f-bf17-90fb36dd41cb -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.5-8.el7,serial=00000000-0000-0000-0000-AC1F6B57AE82,uuid=3074151e-b38f-461f-bf17-90fb36dd41cb' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-01-14T11:24:07,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:10:64,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/3074151e-b38f-461f-bf17-90fb36dd41cb.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/3074151e-b38f-461f-bf17-90fb36dd41cb.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/3074151e-b38f-461f-bf17-90fb36dd41cb.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x2 -incoming defer -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on 2019-01-14T11:24:07.545834Z qemu-kvm: -chardev pty,id=charconsole0: char device redirected to /dev/pts/1 (label charconsole0) 2019-01-14T11:24:07.548146Z qemu-kvm: -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,cache=none,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead 2019-01-14 11:49:26.525+0000: initiating migration We can see the QEMU process was started at 11:24:07.405 and outgoing migration started at 11:49:26.525+0000. So this was a source host. Looking at /var/log/messages on the destination, I can see the error from this bug description Jan 14 13:49:30 ocelot02 libvirtd: 2019-01-14 11:49:30.608+0000: 101924: error : qemuMigrationCheckJobStatus:1457 : operation failed: migration job: unexpectedly failed vdsm log file contains 2019-01-14 13:49:30,621+0200 ERROR (migsrc/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Domain not found: no domain with matching name 'HostedEngine' (migration:287) 2019-01-14 13:49:30,627+0200 ERROR (migsrc/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Failed to migrate (migration:429) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 411, in run self._startUnderlyingMigration(time.time()) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 489, in _startUnderlyingMigration self._perform_with_downtime_thread(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 556, in _perform_with_downtime_thread self._perform_migration(duri, muri) File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration self._vm._dom.migrateToURI3(duri, params, flags) File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1746, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: Domain not found: no domain with matching name 'HostedEngine' On the destination host the qemu log contains: 2019-01-14 11:49:26.077+0000: starting up libvirt version: 4.5.0, package: 10.el7_6.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-11-08-09:31:37, x86-020.build.eng.bos.redhat.com), qemu version: 2.12.0qemu-kvm-rhev-2.12.0-18.el7_6.1, kernel: 3.10.0-957.5.1.el7.x86_64, hostname: ocelot01.qa.lab.tlv.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-IBRS,vme=on,f16c=on,rdrand=on,hypervisor=on,arat=on,xsaveopt=on,abm=on -m 16384 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 3074151e-b38f-461f-bf17-90fb36dd41cb -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.5-8.el7,serial=00000000-0000-0000-0000-AC1F6B57AE82,uuid=3074151e-b38f-461f-bf17-90fb36dd41cb' -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-01-14T11:49:26,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:10:64,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,fd=36,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,fd=37,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x2 -incoming defer -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on 2019-01-14T11:49:26.205346Z qemu-kvm: -chardev pty,id=charconsole0: char device redirected to /dev/pts/0 (label charconsole0) 2019-01-14T11:49:26.207845Z qemu-kvm: -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,werror=stop,rerror=stop,cache=none,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead 2019-01-14T11:49:30.085678Z qemu-kvm: terminating on signal 15 from pid 16219 (<unknown process>) 2019-01-14 11:49:30.285+0000: shutting down, reason=destroyed /var/message/log does not provide a lot of details either: Jan 14 13:49:26 ocelot01 systemd-machined: New machine qemu-1-HostedEngine. Jan 14 13:49:26 ocelot01 systemd: Started Virtual Machine qemu-1-HostedEngine. Jan 14 13:49:26 ocelot01 kvm: 1 guest now active Jan 14 13:49:30 ocelot01 kvm: 0 guests now active Jan 14 13:49:30 ocelot01 systemd-machined: Machine qemu-1-HostedEngine terminated. No libvirt errors shown here. But vsdm.log is interesting: 2019-01-14 13:49:30,076+0200 INFO (jsonrpc/5) [api.virt] START getStats() from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:46) 2019-01-14 13:49:30,077+0200 INFO (jsonrpc/5) [api.virt] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Migration Destination', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': u'vnc', 'port': '-1'}], 'hash': '4043262262160744328', 'acpiEnable': 'true', 'guestFQDN': '', 'vmId': '3074151e-b38f-461f-bf17-90fb36dd41cb', 'displayIp': '0', 'cpuUsage': '0.00', 'displayPort': '-1', 'session': 'Unknown', 'displaySecurePort': '-1', 'timeOffset': '0', 'memUsage': '0', 'clientIp': u'', 'elapsedTime': '11893', 'kvmEnable': u'true', 'vmType': 'kvm', 'cpuUser': '0.00', 'monitorResponse': '0', 'username': 'Unknown', 'displayType': 'vnc', 'cpuSys': '0.00', 'guestCPUCount': -1, 'appsList': (), 'guestIPs': '', 'statusTime': '4294846400', 'vmName': 'HostedEngine'}]} from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:52) 2019-01-14 13:49:30,077+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.getStats succeeded in 0.00 seconds (__init__:573) 2019-01-14 13:49:30,079+0200 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573) 2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [api.virt] START destroy(gracefulAttempts=1) from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:46) 2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Release VM resources (vm:5279) 2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Stopping connection (guestagent:442) 2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [vdsm.api] START teardownImage(sdUUID=u'7d394a51-c3a3-47e1-ac31-697710ef0f70', spUUID=u'00000000-0000-0000-0000-000000000000', imgUUID=u'c8f4766d-5e44-4798-b0e9-000cf97faebe', volUUID=None) from=::1,59606, task_id=80ca6bd0-7f10-4c61-a3b4-665348319091 (api:46) 2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [storage.StorageDomain] Removing image rundir link u'/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe' (fileSD:600) 2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [vdsm.api] FINISH teardownImage return=None from=::1,59606, task_id=80ca6bd0-7f10-4c61-a3b4-665348319091 (api:52) 2019-01-14 13:49:30,082+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Stopping connection (guestagent:442) 2019-01-14 13:49:30,082+0200 INFO (jsonrpc/7) [vdsm.api] START inappropriateDevices(thiefId='3074151e-b38f-461f-bf17-90fb36dd41cb') from=::1,59606, task_id=0a8354b5-d6b4-430c-8a7e-0fac6eb9c713 (api:46) 2019-01-14 13:49:30,084+0200 INFO (jsonrpc/7) [vdsm.api] FINISH inappropriateDevices return=None from=::1,59606, task_id=0a8354b5-d6b4-430c-8a7e-0fac6eb9c713 (api:52) 2019-01-14 13:49:30,225+0200 INFO (jsonrpc/7) [root] /usr/libexec/vdsm/hooks/after_vm_destroy/50_vhostmd: rc=0 err= (hooks:110) 2019-01-14 13:49:30,307+0200 INFO (vm/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Changed state to Down: VM destroyed during the startup (code=10) (vm:1689) 2019-01-14 13:49:30,308+0200 INFO (vm/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Stopping connection (guestagent:442) 2019-01-14 13:49:30,309+0200 INFO (libvirt/events) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') underlying process disconnected (vm:1063) 2019-01-14 13:49:30,318+0200 INFO (jsonrpc/7) [root] /usr/libexec/vdsm/hooks/after_vm_destroy/delete_vhostuserclient_hook: rc=0 err= (hooks:110) 2019-01-14 13:49:30,319+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Can't undefine disconnected VM '3074151e-b38f-461f-bf17-90fb36dd41cb' (vm:2388) 2019-01-14 13:49:30,319+0200 INFO (jsonrpc/7) [api.virt] FINISH destroy return={'status': {'message': 'Machine destroyed', 'code': 0}} from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:52) 2019-01-14 13:49:30,319+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call VM.destroy succeeded in 0.23 seconds (__init__:573) Which looks like someone or something asked vdsm to destroy the domain while it was being migrated to this host (13:49:30,081+0200 ... START destroy) and vdsm did what it was asked to do, which caused a failed migration on the source host. While I can't confirm this from libvirtd.log, there seems to be no libvirt bug involved here. The migration was aborted by a request to destroy the destination domain.
I'm moving this bug to vdsm for further investigation.
likely hosted-engine-broker-ha judging from the ::1 address.
broker: Thread-13::INFO::2019-01-14 13:49:18,495::engine_health::94::engine_health.EngineHealth::(action) VM not on this host Thread-10::INFO::2019-01-14 13:49:19,075::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt with ports Thread-9::INFO::2019-01-14 13:49:20,159::ping::60::ping.Ping::(action) Successfully pinged 10.35.30.254 Thread-11::INFO::2019-01-14 13:49:27,769::mem_free::51::mem_free.MemFree::(action) memFree: 119479 Thread-13::INFO::2019-01-14 13:49:28,513::engine_health::203::engine_health.EngineHealth::(_result_from_stats) VM not running on this host, status Migration Destination Thread-10::INFO::2019-01-14 13:49:29,004::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt with ports Thread-9::INFO::2019-01-14 13:49:30,318::ping::60::ping.Ping::(action) Successfully pinged 10.35.30.254 Thread-14::INFO::2019-01-14 13:49:30,345::engine_health::203::engine_health.EngineHealth::(_result_from_stats) VM not running on this host, status Down agent: MainThread::INFO::2019-01-14 13:49:19,254::states::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive MainThread::ERROR::2019-01-14 13:49:19,460::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs MainThread::INFO::2019-01-14 13:49:19,529::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) [above lines repeated many times in a 10s interval] MainThread::INFO::2019-01-14 13:49:29,547::states::510::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2019-01-14 13:49:29,576::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? sent MainThread::ERROR::2019-01-14 13:49:29,823::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs MainThread::INFO::2019-01-14 13:49:29,892::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::ERROR::2019-01-14 13:49:30,059::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs MainThread::INFO::2019-01-14 13:49:30,072::hosted_engine::967::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2019-01-14 13:49:30,078::hosted_engine::987::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Cleaning state for non-running VM MainThread::INFO::2019-01-14 13:49:31,352::hosted_engine::979::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean so the agent disregarded the MigrationDestination case and just decided to start a new HostedEngine - that requires to kill any lingering VM, Martine, is that expected?
Well, if there was an incoming migration then the source host was not reporting the status properly or the local agent was not sure the source host was actually up (it takes at least two updates after agent restart - ~20 seconds). This might be a corner case in the state machine.
would it make sense to defer the decision until there is no Migration Destination state? The migration works pretty reliably on its own so I guess agent could wait a bit longer.
That is one option yes.
Simone, I believe the right place to fix this is fir the agent to consider Migration Destination state as well
re-targeting to 4.3.1 since this BZ has not been proposed as blocker for 4.3.0. If you think this bug should block 4.3.0 please re-target and set blocker flag.
Moving to 4.3.2 not being identified as blocker for 4.3.1.
*** Bug 1629454 has been marked as a duplicate of this bug. ***
*** Bug 1598509 has been marked as a duplicate of this bug. ***
Works just fine on these components: ovirt-engine-setup-4.3.5.4-0.1.el7.noarch ovirt-hosted-engine-ha-2.3.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.11-1.el7ev.noarch Linux 3.10.0-1060.el7.x86_64 #1 SMP Mon Jul 1 18:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.7 Beta (Maipo) Moving to verified, in case of still being reproduced, please reopen.
This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.