Bug 1966121 - libvirtError while migration: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Summary: libvirtError while migration: cannot acquire state change lock (held by monit...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.60.7
Hardware: ppc64le
OS: Linux
unspecified
high
Target Milestone: ovirt-4.4.8
: 4.40.80.2
Assignee: Milan Zamazal
QA Contact: Polina
URL:
Whiteboard:
Depends On: 1967715 1983694
Blocks: 1959436
TreeView+ depends on / blocked
 
Reported: 2021-05-31 12:25 UTC by Polina
Modified: 2021-09-03 10:08 UTC (History)
4 users (show)

Fixed In Version: vdsm-4.40.80.2
Doc Type: Bug Fix
Doc Text:
If a VM was destroyed during migration, libvirt could report errors about acquiring state change lock and prevent the VM from starting on the same host again. It has been fixed and VMs powered down during migrations shouldn't cause trouble anymore.
Clone Of:
Environment:
Last Closed: 2021-09-03 10:08:02 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
logs (11.94 MB, application/gzip)
2021-05-31 12:25 UTC, Polina
no flags Details
logs for 4.4.7.4 (943.18 KB, application/gzip)
2021-06-20 10:06 UTC, Polina
no flags Details

Description Polina 2021-05-31 12:25:42 UTC
Created attachment 1788306 [details]
logs

Description of problem: we saw the failure several times but it is not consistently reproducible.
Additional info: the migration failure happened while automation run and after this failure, in the following tests, the VM didn't get IP on start.
When we reconfigured the libvirt with DEBUG mode and restarted the libvirtd service the failure was not reproduced again.
I attach the full logs for the source and destination hosts we have after the automation run.


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.6.8-0.1.el8ev.noarch

How reproducible:

1. The migration was triggered by rest API
url:/ovirt-engine/api/vms/28a96c1c-a1af-4167-b4ca-eca872fd7cad/migrate 
body:
<action>
    <async>false</async>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
</action>


Actual results:

in the engine.log the timestamp is :
2021-05-20 00:14:20,747+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [vms_syncAction_d26e977d-ecd8-44c1] EVENT_ID: VM_MIGRATION_START(62), Migration started (VM: golden_env_mixed_virtio_1_0, Source: host_mixed_1, Destination: host_mixed_2, User: admin@internal-authz).
.
.
2021-05-20 00:15:36,770+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-17) [4da5b903] Migration of VM 'golden_env_mixed_virtio_1_0' to host 'host_mixed_2' failed: VM destroyed during the startup.

In vdsm.log - 
2021-05-20 00:15:36,694+0300 ERROR (migsrc/28a96c1c) [virt.vm] (vmId='28a96c1c-a1af-4167-b4ca-eca872fd7cad') Failed to migrate (migration:467)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2119, in migrateToURI3
    raise libvirtError('virDomainMigrateToURI3() failed')
libvirt.libvirtError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Expected results:

Comment 1 Milan Zamazal 2021-05-31 13:07:11 UTC
I experienced a similar problem on my x86_64 laptop.

Comment 2 Milan Zamazal 2021-06-03 17:06:13 UTC
Reported a libvirt bug: BZ 1967715. It describes a different, reproducible situation where this problems occurs. I'm not sure it's the same bug but the symptoms look the same and it makes possible to start with something reproducible.

Comment 3 Polina 2021-06-20 10:06:31 UTC
Created attachment 1792488 [details]
logs for 4.4.7.4

Comment 10 Milan Zamazal 2021-07-26 09:06:08 UTC
The libvirt fix is available in libvirt 7.0.0-14.3, while Vdsm depends on 7.0.0-14.  Shouldn't we adjust the Vdsm dependency?

Comment 11 Polina 2021-08-22 17:16:38 UTC
verified on ovirt-engine-4.4.8.4-0.7.el8ev.noarch

no remoteDispatchDomainMigratePerform3Params migration failure in PPC automation runs


Note You need to log in before you can comment on or make changes to this bug.