Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1966121

Summary: libvirtError while migration: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Product: [oVirt] vdsm Reporter: Polina <pagranat>
Component: GeneralAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.40.60.7CC: ahadas, bugs, mburman, mzamazal
Target Milestone: ovirt-4.4.8Keywords: Automation
Target Release: 4.40.80.2Flags: pm-rhel: ovirt-4.4+
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.40.80.2 Doc Type: Bug Fix
Doc Text:
If a VM was destroyed during migration, libvirt could report errors about acquiring state change lock and prevent the VM from starting on the same host again. It has been fixed and VMs powered down during migrations shouldn't cause trouble anymore.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-03 10:08:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1967715, 1983694    
Bug Blocks: 1959436    
Attachments:
Description Flags
logs
none
logs for 4.4.7.4 none

Description Polina 2021-05-31 12:25:42 UTC
Created attachment 1788306 [details]
logs

Description of problem: we saw the failure several times but it is not consistently reproducible.
Additional info: the migration failure happened while automation run and after this failure, in the following tests, the VM didn't get IP on start.
When we reconfigured the libvirt with DEBUG mode and restarted the libvirtd service the failure was not reproduced again.
I attach the full logs for the source and destination hosts we have after the automation run.


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.6.8-0.1.el8ev.noarch

How reproducible:

1. The migration was triggered by rest API
url:/ovirt-engine/api/vms/28a96c1c-a1af-4167-b4ca-eca872fd7cad/migrate 
body:
<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
</action>


Actual results:

in the engine.log the timestamp is :
2021-05-20 00:14:20,747+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [vms_syncAction_d26e977d-ecd8-44c1] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_1, Destination: host_mixed_2, User: admin@internal-authz).
.
.
2021-05-20 00:15:36,770+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-17) [4da5b903] Migration of VM 'golden_env_mixed_virtio_1_0' to host 'host_mixed_2' failed: VM destroyed during the startup.

In vdsm.log - 
2021-05-20 00:15:36,694+0300 ERROR (migsrc/28a96c1c) [virt.vm] (vmId='28a96c1c-a1af-4167-b4ca-eca872fd7cad') Failed to migrate (migration:467)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2119, in migrateToURI3
    raise libvirtError('virDomainMigrateToURI3() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Expected results:

Comment 1 Milan Zamazal 2021-05-31 13:07:11 UTC
I experienced a similar problem on my x86_64 laptop.

Comment 2 Milan Zamazal 2021-06-03 17:06:13 UTC
Reported a libvirt bug: BZ 1967715. It describes a different, reproducible situation where this problems occurs. I'm not sure it's the same bug but the symptoms look the same and it makes possible to start with something reproducible.

Comment 3 Polina 2021-06-20 10:06:31 UTC
Created attachment 1792488 [details]
logs for 4.4.7.4

Comment 10 Milan Zamazal 2021-07-26 09:06:08 UTC
The libvirt fix is available in libvirt 7.0.0-14.3, while Vdsm depends on 7.0.0-14.  Shouldn't we adjust the Vdsm dependency?

Comment 11 Polina 2021-08-22 17:16:38 UTC
verified on ovirt-engine-4.4.8.4-0.7.el8ev.noarch

no remoteDispatchDomainMigratePerform3Params migration failure in PPC automation runs