Created attachment 1788306 [details] logs Description of problem: we saw the failure several times but it is not consistently reproducible. Additional info: the migration failure happened while automation run and after this failure, in the following tests, the VM didn't get IP on start. When we reconfigured the libvirt with DEBUG mode and restarted the libvirtd service the failure was not reproduced again. I attach the full logs for the source and destination hosts we have after the automation run. Version-Release number of selected component (if applicable): ovirt-engine-4.4.6.8-0.1.el8ev.noarch How reproducible: 1. The migration was triggered by rest API url:/ovirt-engine/api/vms/28a96c1c-a1af-4167-b4ca-eca872fd7cad/migrate body: <action> <async>false</async> <grace_period> <expiry>10</expiry> </grace_period> </action> Actual results: in the engine.log the timestamp is : 2021-05-20 00:14:20,747+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [vms_syncAction_d26e977d-ecd8-44c1] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_1, Destination: host_mixed_2, User: admin@internal-authz). . . 2021-05-20 00:15:36,770+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-17) [4da5b903] Migration of VM 'golden_env_mixed_virtio_1_0' to host 'host_mixed_2' failed: VM destroyed during the startup. In vdsm.log - 2021-05-20 00:15:36,694+0300 ERROR (migsrc/28a96c1c) [virt.vm] (vmId='28a96c1c-a1af-4167-b4ca-eca872fd7cad') Failed to migrate (migration:467) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2119, in migrateToURI3 raise libvirtError('virDomainMigrateToURI3() failed') libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params) Expected results:
I experienced a similar problem on my x86_64 laptop.
Reported a libvirt bug: BZ 1967715. It describes a different, reproducible situation where this problems occurs. I'm not sure it's the same bug but the symptoms look the same and it makes possible to start with something reproducible.
Created attachment 1792488 [details] logs for 4.4.7.4
The libvirt fix is available in libvirt 7.0.0-14.3, while Vdsm depends on 7.0.0-14. Shouldn't we adjust the Vdsm dependency?
verified on ovirt-engine-4.4.8.4-0.7.el8ev.noarch no remoteDispatchDomainMigratePerform3Params migration failure in PPC automation runs