Bug 1598084
| Summary: | cannot acquire state change lock when create snapshot with --quiesce | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Han Han <hhan> | ||||
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.6 | CC: | dyuan, fjin, hhan, jdenemar, junli, juzhang, lmen, xuzhang, yafu, yanqzhan | ||||
| Target Milestone: | rc | Keywords: | Automation, Regression, TestBlocker, Upstream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-4.5.0-2.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-30 09:57:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Please, provide a full log from libvirtd covering both steps. I think I know what the problem is: Thread 2 (Thread 0x7fee847ba700 (LWP 2206)): #0 0x00007fee8b309e56 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7fee847b96e0, expected=0, futex_word=0x7fee4c0da8a0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 __pthread_cond_wait_common (abstime=0x7fee847b96e0, mutex=0x7fee4c0da740, cond=0x7fee4c0da878) at pthread_cond_wait.c:539 #2 __pthread_cond_timedwait (cond=0x7fee4c0da878, mutex=0x7fee4c0da740, abstime=0x7fee847b96e0) at pthread_cond_wait.c:667 #3 0x00007fee8e27cd17 in virCondWaitUntil (c=0x7fee4c0da878, m=0x7fee4c0da740, whenms=1530701415591) at util/virthread.c:169 #4 0x00007fee7849cc10 in qemuDomainObjBeginJobInternal (driver=0x7fee4c0718d0, obj=0x7fee4c0da730, job=QEMU_JOB_NONE, agentJob=QEMU_AGENT_JOB_MODIFY, asyncJob=QEMU_ASYNC_JOB_NONE, nowait=false) at qemu/qemu_domain.c:6505 #5 0x00007fee7849d785 in qemuDomainObjBeginAgentJob (driver=0x7fee4c0718d0, obj=0x7fee4c0da730, agentJob=QEMU_AGENT_JOB_MODIFY) at qemu/qemu_domain.c:6698 #6 0x00007fee785478c7 in qemuDomainSnapshotCreateActiveExternal (driver=0x7fee4c0718d0, vm=0x7fee4c0da730, snap=0x7fee7c001f20, flags=84) at qemu/qemu_driver.c:15159 #7 0x00007fee7854885e in qemuDomainSnapshotCreateXML (domain=0x7fee7c02e4d0, xmlDesc=0x7fee7c002d90 "<domainsnapshot>\n <name>fedora_snap</name>\n <description>fedora_snap_desc</description>\n</domainsnapshot>\n", flags=84) at qemu/qemu_driver.c:15547 #8 0x00007fee8e4b419e in virDomainSnapshotCreateXML (domain=0x7fee7c02e4d0, xmlDesc=0x7fee7c002d90 "<domainsnapshot>\n <name>fedora_snap</name>\n <description>fedora_snap_desc</description>\n</domainsnapshot>\n", flags=84) at libvirt-domain-snapshot.c:231 qemuDomainSnapshotCreateActiveExternal() is trying to start AGENT_JOB_MODIFY, therefore BeginJobInternal(job=QEMU_JOB_NONE, agentJob=QEMU_AGENT_JOB_MODIFY) is called. And since there is an async job already running @nested is false and thus qemuDomainNestedJobAllowed(priv, job=QEMU_JOB_NONE) is called which returns 0 making the thread wait for async job to finish. This deadlocks obviously. Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2018-July/msg00258.html Reproduced on rhel7.6 with: libvirt-4.5.0-1.el7.x86_64 qemu-kvm-rhev-2.12.0-7.el7.x86_64 qemu-guest-agent-2.12.0-1.el7.x86_64 # virsh snapshot-create-as V-7.6 snap_3 --quiesce --disk-only error: Timed out during operation: cannot acquire state change lock # cat libvirtd.log|grep qemuDomainObj debug : qemuDomainObjBeginJobInternal:6450 : Starting job: job=none agentJob=none asyncJob=snapshot (vm=0x7f56401bc6e0 name=V-7.6, current job=none agentJob=none async=none) debug : qemuDomainObjBeginJobInternal:6507 : Started async job: snapshot (vm=0x7f56401bc6e0 name=V-7.6) debug : qemuDomainObjBeginJobInternal:6450 : Starting job: job=none agentJob=modify asyncJob=none (vm=0x7f56401bc6e0 name=V-7.6, current job=none agentJob=none async=snapshot) debug : qemuDomainObjBeginJobInternal:6471 : Waiting for async job (vm=0x7f56401bc6e0 name=V-7.6) warning : qemuDomainObjBeginJobInternal:6564 : Cannot start job (none, modify, none) for domain V-7.6; current job is (none, none, snapshot) owned by (0 <null>, 0 <null>, 5140 remoteDispatchDomainSnapshotCreateXML (flags=0x50)) for (0s, 0s, 30s) error : qemuDomainObjBeginJobInternal:6594 : Timed out during operation: cannot acquire state change lock debug : qemuDomainObjEndAsyncJob:6838 : Stopping async job: snapshot (vm=0x7f56401bc6e0 name=V-7.6) Verified on rhel7.6 with: libvirt-4.5.0-3.el7.x86_64 qemu-kvm-rhev-2.12.0-7.el7.x86_64 qemu-guest-agent-2.12.0-1.el7.x86_64 # virsh snapshot-create-as V-7.6 snap_1 --quiesce --disk-only Domain snapshot snap_1 created And no any error in libvirtd.log; # cat libvirtd.log|grep qemuDomainObj debug : qemuDomainObjBeginJobInternal:6458 : Starting job: job=none agentJob=none asyncJob=snapshot (vm=0x7fb1bc1bf790 name=V-7.6, current job=none agentJob=none async=none) debug : qemuDomainObjBeginJobInternal:6515 : Started async job: snapshot (vm=0x7fb1bc1bf790 name=V-7.6) 2484: debug : qemuDomainObjBeginJobInternal:6458 : Starting job: job=none agentJob=modify asyncJob=none (vm=0x7fb1bc1bf790 name=V-7.6, current job=none agentJob=none async=snapshot) debug : qemuDomainObjBeginJobInternal:6535 : Started agent job: modify (vm=0x7fb1bc1bf790 name=V-7.6 job=none async=snapshot) debug : qemuDomainObjEnterAgent:7002 : Entering agent (agent=0x7fb1bc1d1e40 vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjExitAgent:7024 : Exited agent (agent=0x7fb1bc1d1e40 vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjEndAgentJob:6804 : Stopping agent job: modify (async=snapshot vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjBeginJobInternal:6458 : Starting job: job=async nested agentJob=none asyncJob=none (vm=0x7fb1bc1bf790 name=V-7.6, current job=none agentJob=none async=snapshot) debug : qemuDomainObjBeginJobInternal:6507 : Started job: async nested (async=snapshot vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjEnterMonitorInternal:6899 : Entering monitor (mon=0x7fb210014b90 vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjExitMonitorInternal:6922 : Exited monitor (mon=0x7fb210014b90 vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjEndJob:6783 : Stopping job: async nested (async=snapshot vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjBeginJobInternal:6458 : Starting job: job=none agentJob=modify asyncJob=none (vm=0x7fb1bc1bf790 name=V-7.6, current job=none agentJob=none async=snapshot) debug : qemuDomainObjBeginJobInternal:6535 : Started agent job: modify (vm=0x7fb1bc1bf790 name=V-7.6 job=none async=snapshot) debug : qemuDomainObjEnterAgent:7002 : Entering agent (agent=0x7fb1bc1d1e40 vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjExitAgent:7024 : Exited agent (agent=0x7fb1bc1d1e40 vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjEndAgentJob:6804 : Stopping agent job: modify (async=snapshot vm=0x7fb1bc1bf790 name=V-7.6) debug : qemuDomainObjEndAsyncJob:6846 : Stopping async job: snapshot (vm=0x7fb1bc1bf790 name=V-7.6) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |
Created attachment 1456464 [details] Libvirtd log of step2 Description of problem: As subject Version-Release number of selected component (if applicable): libvirt-4.5.0-1.el7.x86_64 qemu-kvm-rhev-2.12.0-6.el7.x86_64 guest: qemu-guest-agent-2.12.0-11.el8+619+ba8f0855.1.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare a running VM with qemu-guest-agent active 2. Create snapshot with --quiesce # virsh snapshot-create-as usb snap_2 snap2-desc --quiesce --disk-only error: Timed out during operation: cannot acquire state change lock Actual results: Expected results: Additional info: