Bug 1665934
| Summary: | Host stuck in preparing to maintenance - operation failed: migration job: unexpectedly failed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Liran Rotenberg <lrotenbe> | ||||
| Component: | Agent | Assignee: | Simone Tiraboschi <stirabos> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | --- | CC: | adevolder, bugs, ebenahar, jdenemar, lsurette, michal.skrivanek, msivak, rbarry, srevivo, ycui | ||||
| Target Milestone: | ovirt-4.3.5 | Keywords: | Triaged | ||||
| Target Release: | --- | Flags: | sbonazzo:
ovirt-4.3?
sbonazzo: planning_ack? sbonazzo: devel_ack+ sbonazzo: testing_ack? |
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovirt-hosted-engine-setup-2.3.11 | Doc Type: | Bug Fix | ||||
| Doc Text: |
In the past when the host running the engine VM was set into maintenance mode from the engine, the engine VM was going to be migrated by ovirt-ha-agent as an indirect action caused by the maintenance mode.
This process wasn't fully controlled by the engine and it could potentially lead to side effects on specific cases.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-07-30 14:08:16 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Liran Rotenberg
2019-01-14 13:49:08 UTC
Created attachment 1520555 [details]
libvirt logs
Unfortunately, the important libvirt logs files do not seem to be produced at
the time you saw the error. There's no "unexpectedly failed" message in them
and while the message you quoted from /var/log/messages was reported at
2019-01-14 11:49:30.608+0000, the log files start at 2019-01-14
13:11:??.???+0000, which is almost 1.5 hours later.
Anyway, this kind of error is usually reported if the destination QEMU dies
during migration, so /var/log/libvirt/qemu/$VM.log from the destination host
may contain some hints.
One of the hosts gives us
2019-01-14 11:24:07.405+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.8 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-09-04-07:27:24, x86-019.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-21.el7_5.7), hostname: ocelot02.qa.lab.tlv.redhat.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-IBRS,vme=on,f16c=on,rdrand=on,hypervisor=on,arat=on,xsaveopt=on,abm=on -m 16384 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 3074151e-b38f-461f-bf17-90fb36dd41cb -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.5-8.el7,serial=00000000-0000-0000-0000-AC1F6B57AE82,uuid=3074151e-b38f-461f-bf17-90fb36dd41cb' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-01-14T11:24:07,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:10:64,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/3074151e-b38f-461f-bf17-90fb36dd41cb.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/3074151e-b38f-461f-bf17-90fb36dd41cb.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/3074151e-b38f-461f-bf17-90fb36dd41cb.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x2 -incoming defer -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
2019-01-14T11:24:07.545834Z qemu-kvm: -chardev pty,id=charconsole0: char device redirected to /dev/pts/1 (label charconsole0)
2019-01-14T11:24:07.548146Z qemu-kvm: -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,cache=none,werror=stop,rerror=stop,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead
2019-01-14 11:49:26.525+0000: initiating migration
We can see the QEMU process was started at 11:24:07.405 and outgoing migration
started at 11:49:26.525+0000. So this was a source host.
Looking at /var/log/messages on the destination, I can see the error from this
bug description
Jan 14 13:49:30 ocelot02 libvirtd: 2019-01-14 11:49:30.608+0000: 101924: error : qemuMigrationCheckJobStatus:1457 : operation failed: migration job: unexpectedly failed
vdsm log file contains
2019-01-14 13:49:30,621+0200 ERROR (migsrc/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Domain not found: no domain with matching name 'HostedEngine' (migration:287)
2019-01-14 13:49:30,627+0200 ERROR (migsrc/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Failed to migrate (migration:429)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 411, in run
self._startUnderlyingMigration(time.time())
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 489, in _startUnderlyingMigration
self._perform_with_downtime_thread(duri, muri)
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 556, in _perform_with_downtime_thread
self._perform_migration(duri, muri)
File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration
self._vm._dom.migrateToURI3(duri, params, flags)
File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 69, in f
ret = attr(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1006, in wrapper
return func(inst, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1746, in migrateToURI3
if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: Domain not found: no domain with matching name 'HostedEngine'
On the destination host the qemu log contains:
2019-01-14 11:49:26.077+0000: starting up libvirt version: 4.5.0, package: 10.el7_6.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-11-08-09:31:37, x86-020.build.eng.bos.redhat.com), qemu version: 2.12.0qemu-kvm-rhev-2.12.0-18.el7_6.1, kernel: 3.10.0-957.5.1.el7.x86_64, hostname: ocelot01.qa.lab.tlv.redhat.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=HostedEngine,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-HostedEngine/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-IBRS,vme=on,f16c=on,rdrand=on,hypervisor=on,arat=on,xsaveopt=on,abm=on -m 16384 -realtime mlock=off -smp 4,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 3074151e-b38f-461f-bf17-90fb36dd41cb -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.5-8.el7,serial=00000000-0000-0000-0000-AC1F6B57AE82,uuid=3074151e-b38f-461f-bf17-90fb36dd41cb' -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=31,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-01-14T11:49:26,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:10:64,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=35,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,fd=36,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,fd=37,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device VGA,id=video0,vgamem_mb=32,bus=pci.0,addr=0x2 -incoming defer -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
2019-01-14T11:49:26.205346Z qemu-kvm: -chardev pty,id=charconsole0: char device redirected to /dev/pts/0 (label charconsole0)
2019-01-14T11:49:26.207845Z qemu-kvm: -drive file=/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe/0ddd524f-4616-4877-88bb-c5a65cadfb4f,format=raw,if=none,id=drive-virtio-disk0,serial=c8f4766d-5e44-4798-b0e9-000cf97faebe,werror=stop,rerror=stop,cache=none,aio=threads: 'serial' is deprecated, please use the corresponding option of '-device' instead
2019-01-14T11:49:30.085678Z qemu-kvm: terminating on signal 15 from pid 16219 (<unknown process>)
2019-01-14 11:49:30.285+0000: shutting down, reason=destroyed
/var/message/log does not provide a lot of details either:
Jan 14 13:49:26 ocelot01 systemd-machined: New machine qemu-1-HostedEngine.
Jan 14 13:49:26 ocelot01 systemd: Started Virtual Machine qemu-1-HostedEngine.
Jan 14 13:49:26 ocelot01 kvm: 1 guest now active
Jan 14 13:49:30 ocelot01 kvm: 0 guests now active
Jan 14 13:49:30 ocelot01 systemd-machined: Machine qemu-1-HostedEngine terminated.
No libvirt errors shown here. But vsdm.log is interesting:
2019-01-14 13:49:30,076+0200 INFO (jsonrpc/5) [api.virt] START getStats() from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:46)
2019-01-14 13:49:30,077+0200 INFO (jsonrpc/5) [api.virt] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Migration Destination', 'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': u'vnc', 'port': '-1'}], 'hash': '4043262262160744328', 'acpiEnable': 'true', 'guestFQDN': '', 'vmId': '3074151e-b38f-461f-bf17-90fb36dd41cb', 'displayIp': '0', 'cpuUsage': '0.00', 'displayPort': '-1', 'session': 'Unknown', 'displaySecurePort': '-1', 'timeOffset': '0', 'memUsage': '0', 'clientIp': u'', 'elapsedTime': '11893', 'kvmEnable': u'true', 'vmType': 'kvm', 'cpuUser': '0.00', 'monitorResponse': '0', 'username': 'Unknown', 'displayType': 'vnc', 'cpuSys': '0.00', 'guestCPUCount': -1, 'appsList': (), 'guestIPs': '', 'statusTime': '4294846400', 'vmName': 'HostedEngine'}]} from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:52)
2019-01-14 13:49:30,077+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call VM.getStats succeeded in 0.00 seconds (__init__:573)
2019-01-14 13:49:30,079+0200 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call Host.ping2 succeeded in 0.00 seconds (__init__:573)
2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [api.virt] START destroy(gracefulAttempts=1) from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:46)
2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Release VM resources (vm:5279)
2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Stopping connection (guestagent:442)
2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [vdsm.api] START teardownImage(sdUUID=u'7d394a51-c3a3-47e1-ac31-697710ef0f70', spUUID=u'00000000-0000-0000-0000-000000000000', imgUUID=u'c8f4766d-5e44-4798-b0e9-000cf97faebe', volUUID=None) from=::1,59606, task_id=80ca6bd0-7f10-4c61-a3b4-665348319091 (api:46)
2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [storage.StorageDomain] Removing image rundir link u'/var/run/vdsm/storage/7d394a51-c3a3-47e1-ac31-697710ef0f70/c8f4766d-5e44-4798-b0e9-000cf97faebe' (fileSD:600)
2019-01-14 13:49:30,081+0200 INFO (jsonrpc/7) [vdsm.api] FINISH teardownImage return=None from=::1,59606, task_id=80ca6bd0-7f10-4c61-a3b4-665348319091 (api:52)
2019-01-14 13:49:30,082+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Stopping connection (guestagent:442)
2019-01-14 13:49:30,082+0200 INFO (jsonrpc/7) [vdsm.api] START inappropriateDevices(thiefId='3074151e-b38f-461f-bf17-90fb36dd41cb') from=::1,59606, task_id=0a8354b5-d6b4-430c-8a7e-0fac6eb9c713 (api:46)
2019-01-14 13:49:30,084+0200 INFO (jsonrpc/7) [vdsm.api] FINISH inappropriateDevices return=None from=::1,59606, task_id=0a8354b5-d6b4-430c-8a7e-0fac6eb9c713 (api:52)
2019-01-14 13:49:30,225+0200 INFO (jsonrpc/7) [root] /usr/libexec/vdsm/hooks/after_vm_destroy/50_vhostmd: rc=0 err= (hooks:110)
2019-01-14 13:49:30,307+0200 INFO (vm/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Changed state to Down: VM destroyed during the startup (code=10) (vm:1689)
2019-01-14 13:49:30,308+0200 INFO (vm/3074151e) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Stopping connection (guestagent:442)
2019-01-14 13:49:30,309+0200 INFO (libvirt/events) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') underlying process disconnected (vm:1063)
2019-01-14 13:49:30,318+0200 INFO (jsonrpc/7) [root] /usr/libexec/vdsm/hooks/after_vm_destroy/delete_vhostuserclient_hook: rc=0 err= (hooks:110)
2019-01-14 13:49:30,319+0200 INFO (jsonrpc/7) [virt.vm] (vmId='3074151e-b38f-461f-bf17-90fb36dd41cb') Can't undefine disconnected VM '3074151e-b38f-461f-bf17-90fb36dd41cb' (vm:2388)
2019-01-14 13:49:30,319+0200 INFO (jsonrpc/7) [api.virt] FINISH destroy return={'status': {'message': 'Machine destroyed', 'code': 0}} from=::1,59606, vmId=3074151e-b38f-461f-bf17-90fb36dd41cb (api:52)
2019-01-14 13:49:30,319+0200 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call VM.destroy succeeded in 0.23 seconds (__init__:573)
Which looks like someone or something asked vdsm to destroy the domain while
it was being migrated to this host (13:49:30,081+0200 ... START destroy) and
vdsm did what it was asked to do, which caused a failed migration on the
source host.
While I can't confirm this from libvirtd.log, there seems to be no libvirt bug
involved here. The migration was aborted by a request to destroy the
destination domain.
I'm moving this bug to vdsm for further investigation. likely hosted-engine-broker-ha judging from the ::1 address. broker: Thread-13::INFO::2019-01-14 13:49:18,495::engine_health::94::engine_health.EngineHealth::(action) VM not on this host Thread-10::INFO::2019-01-14 13:49:19,075::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt with ports Thread-9::INFO::2019-01-14 13:49:20,159::ping::60::ping.Ping::(action) Successfully pinged 10.35.30.254 Thread-11::INFO::2019-01-14 13:49:27,769::mem_free::51::mem_free.MemFree::(action) memFree: 119479 Thread-13::INFO::2019-01-14 13:49:28,513::engine_health::203::engine_health.EngineHealth::(_result_from_stats) VM not running on this host, status Migration Destination Thread-10::INFO::2019-01-14 13:49:29,004::mgmt_bridge::62::mgmt_bridge.MgmtBridge::(action) Found bridge ovirtmgmt with ports Thread-9::INFO::2019-01-14 13:49:30,318::ping::60::ping.Ping::(action) Successfully pinged 10.35.30.254 Thread-14::INFO::2019-01-14 13:49:30,345::engine_health::203::engine_health.EngineHealth::(_result_from_stats) VM not running on this host, status Down agent: MainThread::INFO::2019-01-14 13:49:19,254::states::497::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) The engine is not running, but we do not have enough data to decide which hosts are alive MainThread::ERROR::2019-01-14 13:49:19,460::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs MainThread::INFO::2019-01-14 13:49:19,529::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineDown (score: 3400) [above lines repeated many times in a 10s interval] MainThread::INFO::2019-01-14 13:49:29,547::states::510::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine down and local host has best score (3400), attempting to start engine VM MainThread::INFO::2019-01-14 13:49:29,576::brokerlink::68::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineDown-EngineStart) sent? sent MainThread::ERROR::2019-01-14 13:49:29,823::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs MainThread::INFO::2019-01-14 13:49:29,892::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineStart (score: 3400) MainThread::ERROR::2019-01-14 13:49:30,059::config_ovf::84::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs MainThread::INFO::2019-01-14 13:49:30,072::hosted_engine::967::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Ensuring VDSM state is clear for engine VM MainThread::INFO::2019-01-14 13:49:30,078::hosted_engine::987::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Cleaning state for non-running VM MainThread::INFO::2019-01-14 13:49:31,352::hosted_engine::979::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_clean_vdsm_state) Vdsm state for VM clean so the agent disregarded the MigrationDestination case and just decided to start a new HostedEngine - that requires to kill any lingering VM, Martine, is that expected? Well, if there was an incoming migration then the source host was not reporting the status properly or the local agent was not sure the source host was actually up (it takes at least two updates after agent restart - ~20 seconds). This might be a corner case in the state machine. would it make sense to defer the decision until there is no Migration Destination state? The migration works pretty reliably on its own so I guess agent could wait a bit longer. That is one option yes. Simone, I believe the right place to fix this is fir the agent to consider Migration Destination state as well re-targeting to 4.3.1 since this BZ has not been proposed as blocker for 4.3.0. If you think this bug should block 4.3.0 please re-target and set blocker flag. Moving to 4.3.2 not being identified as blocker for 4.3.1. *** Bug 1629454 has been marked as a duplicate of this bug. *** *** Bug 1598509 has been marked as a duplicate of this bug. *** Works just fine on these components: ovirt-engine-setup-4.3.5.4-0.1.el7.noarch ovirt-hosted-engine-ha-2.3.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.11-1.el7ev.noarch Linux 3.10.0-1060.el7.x86_64 #1 SMP Mon Jul 1 18:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.7 Beta (Maipo) Moving to verified, in case of still being reproduced, please reopen. This bugzilla is included in oVirt 4.3.5 release, published on July 30th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.5 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |