RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1665256 - Live storage migration fails with: TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats) and the VM becomes 'Not Responding'
Summary: Live storage migration fails with: TimeoutError: Timed out during operation: ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.6
Hardware: x86_64
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 7.7
Assignee: Sergio Lopez
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks: 1755340
TreeView+ depends on / blocked
 
Reported: 2019-01-10 19:50 UTC by Elad
Modified: 2022-02-18 20:36 UTC (History)
19 users (show)

Fixed In Version: qemu-kvm-rhev-2.12.0-36.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1755340 (view as bug list)
Environment:
Last Closed: 2020-03-31 14:34:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (2.24 MB, application/gzip)
2019-01-10 20:04 UTC, Elad
no flags Details
logs2 (1.20 MB, application/gzip)
2019-01-10 21:36 UTC, Elad
no flags Details
Logs (2.11 MB, application/zip)
2019-01-26 22:25 UTC, Shir Fishbain
no flags Details
logs(libvirt in debug) (7.36 MB, application/gzip)
2019-03-21 16:52 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:1216 0 None None None 2020-03-31 14:37:02 UTC

Description Elad 2019-01-10 19:50:49 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Elad 2019-01-10 20:04:09 UTC
Description of problem:
LSM's diskReplicateFinish failed with TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)
VM becomes Not Responding

A few bugs were reported with this same error, but none of them still open:
bug 1419856, bug 1481022, bug 1426477, bug 1530346

Version-Release number of selected component (if applicable):
ovirt-engine-4.3.0-0.6.alpha2.el7.noarch
vdsm-4.30.4-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.3.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
sanlock-3.6.0-1.el7.x86_64
selinux-policy-3.13.1-229.el7_6.6.noarch
RHEL7.6 kernel 3.10.0-957.3.1.el7.x86_64


How reproducible:
Happened once

Steps to Reproduce:
1. Cloned a VM from a template as a deep clone with it disks on an iscsi domain
2. Started the VM
3. Migrated VM's disk

Actual results:


2019-01-10 21:07:18,342+0200 ERROR (jsonrpc/6) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats) (api:134)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 573, in diskReplicateFinish
    return self.vm.diskReplicateFinish(srcDisk, dstDisk)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4597, in diskReplicateFinish
    blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)



libvirtd.log:

2019-01-10 19:07:18.340+0000: 4046: warning : qemuDomainObjBeginJobInternal:6724 : Cannot start job (query, none, none) for domain vm_random_test_1021012797; current job is (query, none, none) owned by (4043 rem
oteDispatchConnectGetAllDomainStats, 0 <null>, 0 <null> (flags=0x0)) for (63s, 0s, 0s)
2019-01-10 19:07:18.340+0000: 4046: error : qemuDomainObjBeginJobInternal:6746 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)


2019-01-10 19:54:51.169+0000: 4047: error : qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU guest agent is not connected


The VM entered 'Not Responding' state

Expected results:
Live storage migration to succeed

Additional info:
engine.log:

2019-01-10 21:07:18,403+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [disks_syncAction_db9617cd-2afd-4a86] Failed in 'VmReplicateDiskFinishVDS' method


2019-01-10 21:07:18,483+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-18) [disks_syncAction_db9617cd-2afd-4a86] EngineException: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)',) (Failed with error GeneralException and code 100): org.ovirt.engine.core.common.errors.EngineException: EngineException: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)',) (Failed with error GeneralException and code 100)
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskFinish(LiveMigrateDiskCommand.java:402) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.completeLiveMigration(LiveMigrateDiskCommand.java:374) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:242) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:71) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:146) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:107) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_191]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_191]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_191]

Comment 2 Elad 2019-01-10 20:04:50 UTC
Created attachment 1519916 [details]
logs

Comment 3 Elad 2019-01-10 21:36:58 UTC
Created attachment 1519943 [details]
logs2

Another occurrence:

2019-01-10 23:18:32,783+0200 DEBUG (mailbox-spm) [storage.Misc.excCmd] SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0152982 s, 66.9 MB/s\n'; <rc> = 0 (commands:219)
2019-01-10 23:18:34,088+0200 ERROR (jsonrpc/0) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockJobInfo) (api:134)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 573, in diskReplicateFinish
    return self.vm.diskReplicateFinish(srcDisk, dstDisk)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4597, in diskReplicateFinish
    blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainGetBlockJobInfo)
2019-01-10 23:18:34,090+0200 INFO  (jsonrpc/0) [api.virt] FINISH diskReplicateFinish return={'status': {'message': "General Exception: ('Timed out during operation: cannot acquire state change lock (held by moni
tor=remoteDispatchDomainGetBlockJobInfo)',)", 'code': 100}} from=::ffff:10.35.161.127,42186, flow_id=disks_syncAction_a8b7a35a-df1b-4d31, vmId=18064b74-cd7c-4e98-92b7-5d46a1d903c1 (api:54)

Comment 4 Raz Tamir 2019-01-11 11:12:36 UTC
Adding a regression flag based on https://bugzilla.redhat.com/show_bug.cgi?id=1530346

Comment 5 Martin Tessun 2019-01-14 09:24:30 UTC
Hi Jarda,

looks like it happens from time to time that libvirt gets stuck, thus having some issues in RHV, typically during the LSM (Live Storage Migration flows).

Please see comment #1 and comment #4 as well (Bug 1530346 showing a hung libvirt at least).

Comment 6 Fred Rolland 2019-01-21 09:07:13 UTC
Jarda, can you check Martin's comment?

Comment 7 Shir Fishbain 2019-01-26 22:23:24 UTC
The ERROR appears also when running automation tests for test storage migration version 4.3.
The full log of run here: https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.3-ge-runner-storage/245/consoleFull
Attached logs from my local runs on my env.

From vdsm log:
2019-01-24 15:12:00,956+0200 ERROR (jsonrpc/1) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats) (api:13
4)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 574, in diskReplicateFinish
    return self.vm.diskReplicateFinish(srcDisk, dstDisk)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4613, in diskReplicateFinish
    blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)
The ERROR (from vdsm log):
2019-01-24 15:12:00,956+0200 ERROR (jsonrpc/1) [api] FINISH diskReplicateFinish error=Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats) (api:13
4)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 574, in diskReplicateFinish
    return self.vm.diskReplicateFinish(srcDisk, dstDisk)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4613, in diskReplicateFinish
    blkJobInfo = self._dom.blockJobInfo(drive.name, 0)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 108, in f
    raise toe
TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)

From engine log :
2019-01-24 15:11:29,686+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [disks_syncAction_47a726d1-b76b-449c] Failed VmReplicateDiskFinish (Disk '84fc01cb-3331-4b16-ad1f-904bac29edfb', VM '3ea8cbdb-3ce6-475e-9f22-85aeb363ec8a')
2019-01-24 15:11:29,686+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [disks_syncAction_47a726d1-b76b-449c] Command 'LiveMigrateDisk' id: '9dcb7465-aa1d-4e2b-977d-efa188c1ddc6' with children [df05d4dc-282e-472f-a88f-6e218c8e4bf5, 42da3eeb-43e9-4126-81de-bef5ac88a62f, 93d635be-50a6-4422-913d-1fe8859c73d6] failed when attempting to perform the next operation, marking as 'ACTIVE'
2019-01-24 15:11:29,686+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [disks_syncAction_47a726d1-b76b-449c] EngineException: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)',) (Failed with error GeneralException and code 100): org.ovirt.engine.core.common.errors.EngineException: EngineException: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)',) (Failed with error GeneralException and code 100)
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskFinish(LiveMigrateDiskCommand.java:402) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.completeLiveMigration(LiveMigrateDiskCommand.java:374) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:242) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_191]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_191]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

Appears also when running automation tests for test storage migration version 4.3.
The full log of run here: https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.3-ge-runner-storage/245/consoleFull
Attached logs from my local runs on my env.

From engine log :
2019-01-24 15:11:29,686+02 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [disks_syncAction_47a726d1-b76b-449c] Failed VmReplicateDiskFinish (Disk '84fc01cb-3331-4b16-ad1f-904bac29edfb', VM '3ea8cbdb-3ce6-475e-9f22-85aeb363ec8a')
2019-01-24 15:11:29,686+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [disks_syncAction_47a726d1-b76b-449c] Command 'LiveMigrateDisk' id: '9dcb7465-aa1d-4e2b-977d-efa188c1ddc6' with children [df05d4dc-282e-472f-a88f-6e218c8e4bf5, 42da3eeb-43e9-4126-81de-bef5ac88a62f, 93d635be-50a6-4422-913d-1fe8859c73d6] failed when attempting to perform the next operation, marking as 'ACTIVE'
2019-01-24 15:11:29,686+02 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-14) [disks_syncAction_47a726d1-b76b-449c] EngineException: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)',) (Failed with error GeneralException and code 100): org.ovirt.engine.core.common.errors.EngineException: EngineException: General Exception: ('Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)',) (Failed with error GeneralException and code 100)
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.replicateDiskFinish(LiveMigrateDiskCommand.java:402) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.completeLiveMigration(LiveMigrateDiskCommand.java:374) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.performNextOperation(LiveMigrateDiskCommand.java:242) [bll.jar:]
        at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:]
        at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175) [bll.jar:]
        at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_191]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent-1.0.jar:]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent-1.0.jar:]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_191]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_191]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_191]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent-1.0.jar:]
        at org.jboss.as.ee.concurrent.service.ElytronManagedThreadFactory$ElytronManagedThread.run(ElytronManagedThreadFactory.java:78)

Comment 8 Shir Fishbain 2019-01-26 22:25:00 UTC
Created attachment 1523816 [details]
Logs

Comment 9 Jaroslav Suchanek 2019-01-28 13:19:41 UTC
(In reply to Martin Tessun from comment #5)
> Hi Jarda,
> 
> looks like it happens from time to time that libvirt gets stuck, thus having
> some issues in RHV, typically during the LSM (Live Storage Migration flows).
> 
> Please see comment #1 and comment #4 as well (Bug 1530346 showing a hung
> libvirt at least).

Hmm, the commit 72adaf2f10509c3682f2c65ffad4176e00e5a2fb from https://bugzilla.redhat.com/show_bug.cgi?id=1536461#c1 should be part of libvirt-4.5.0. Michal, any idea?

Either way, feel free to clone this to rhel-7.7.

Comment 10 Michal Privoznik 2019-01-28 14:36:03 UTC
(In reply to Jaroslav Suchanek from comment #9)
> (In reply to Martin Tessun from comment #5)
> > Hi Jarda,
> > 
> > looks like it happens from time to time that libvirt gets stuck, thus having
> > some issues in RHV, typically during the LSM (Live Storage Migration flows).
> > 
> > Please see comment #1 and comment #4 as well (Bug 1530346 showing a hung
> > libvirt at least).
> 
> Hmm, the commit 72adaf2f10509c3682f2c65ffad4176e00e5a2fb from
> https://bugzilla.redhat.com/show_bug.cgi?id=1536461#c1 should be part of
> libvirt-4.5.0. Michal, any idea?
> 

Yes, the commit is in 4.1.0 so it is in 4.5.0 too. I wonder if this is a true case of two threads competing or if some API forgot to clean the job.

  TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)

This error message suggest that it is the former case, which brings up question why GetAllDomainStats is stuck. For instance is there a qemu process stuck? Elad (or anybody else who is able to reproduce), can you please attach gdb to libvirtd as soon as you see the error and get me the output of 't a a bt'? Thanks.

BTW while it is a libvirt error message that we see here it is still not confirmed it is a libvirt bug therefore I am not changing the component just yet.

Comment 11 Elad 2019-02-04 09:12:23 UTC
Hi Benny, maybe you have some way to reproduce this issue so we'll be able to provide the info requested in comment #10

Comment 12 Benny Zlotnik 2019-03-05 10:43:02 UTC
I could not reproduce it manually

Comment 13 Michal Privoznik 2019-03-18 13:39:12 UTC
(In reply to Benny Zlotnik from comment #12)
> I could not reproduce it manually

That's unfortunate. At any rate, if you could attach gdb to libvirtd once you see this error again and run 't a a bt' that'd be great.

1) sudo gdb -p $(pgrep libvirtd)
2) t a a bt
3) share the output

Thanks!

Comment 14 Elad 2019-03-21 16:52:11 UTC
Created attachment 1546624 [details]
logs(libvirt in debug)

Managed to reproduce:
libvirt-4.5.0-10.el7_6.4.x86_64
qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64
vdsm-4.40.0-59.git5533158.el7.x86_64
selinux-policy-3.13.1-229.el7_6.9.noarch
kernel - 3.10.0-957.5.1.el7.x86_64


2019-03-21 15:43:59.863+0000: 24555: warning : qemuDomainObjBeginJobInternal:6724 : Cannot start job (query, none, none) for domain vm_random_test_2117362559; current job is (query, none, none) owned by (24557 r
emoteDispatchConnectGetAllDomainStats, 0 <null>, 0 <null> (flags=0x0)) for (76s, 0s, 0s)
2019-03-21 15:43:59.864+0000: 24555: error : qemuDomainObjBeginJobInternal:6746 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchConnectGetAllDomainStats)



(gdb) t a a bt

Thread 18 (Thread 0x7eff37b1d700 (LWP 24555)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7eff3731c700 (LWP 24556)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7eff36b1b700 (LWP 24557)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff2220738b in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#3  0x00007eff22218e50 in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007eff2221b200 in qemuMonitorJSONGetBalloonInfo () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007eff2221b375 in qemuMonitorJSONGetMemoryStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6  0x00007eff2222cdff in qemuDomainMemoryStatsInternal () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7  0x00007eff2222cf70 in qemuDomainGetStatsBalloon () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007eff2222d5a9 in qemuConnectGetAllDomainStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#9  0x00007eff4792dd96 in virConnectGetAllDomainStats () from /lib64/libvirt.so.0
#10 0x000055b5255a7d60 in remoteDispatchConnectGetAllDomainStatsHelper ()
#11 0x00007eff47850425 in virNetServerProgramDispatch () from /lib64/libvirt.so.0
#12 0x00007eff47856bfd in virNetServerHandleJob () from /lib64/libvirt.so.0
#13 0x00007eff4778e4a1 in virThreadPoolWorker () from /lib64/libvirt.so.0
#14 0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#15 0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7eff3631a700 (LWP 24558)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 14 (Thread 0x7eff35b19700 (LWP 24559)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7eff35318700 (LWP 24560)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e4eb in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7eff34b17700 (LWP 24561)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e4eb in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7eff34316700 (LWP 24562)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e4eb in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7eff33b15700 (LWP 24563)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e4eb in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7eff33314700 (LWP 24564)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e4eb in virThreadPoolWorker () from /lib64/libvirt.so.0
---Type <return> to continue, or q <return> to quit---
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7eff21cbf700 (LWP 24565)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7eff214be700 (LWP 24566)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7eff20cbd700 (LWP 24567)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7efef0f6b700 (LWP 24568)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7efeebfff700 (LWP 24569)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 3 (Thread 0x7efeea7e0700 (LWP 24633)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff22d4b168 in udevEventHandleThread () from /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so
#3  0x00007eff4778d852 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7efeeafe1700 (LWP 26685)):
#0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007eff4778e553 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007eff448a3ead in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7eff482578c0 (LWP 24554)):
#0  0x00007eff4489920d in poll () from /lib64/libc.so.6
#1  0x00007eff4772e2d6 in virEventPollRunOnce () from /lib64/libvirt.so.0
#2  0x00007eff4772cdb2 in virEventRunDefaultImpl () from /lib64/libvirt.so.0
#3  0x00007eff4785647d in virNetDaemonRun () from /lib64/libvirt.so.0
#4  0x000055b52559b8a9 in main ()

Comment 15 Michal Privoznik 2019-03-22 14:48:30 UTC
(In reply to Elad from comment #14)
> Created attachment 1546624 [details]
> logs(libvirt in debug)
> 
> Managed to reproduce:
> libvirt-4.5.0-10.el7_6.4.x86_64
> qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64
> vdsm-4.40.0-59.git5533158.el7.x86_64
> selinux-policy-3.13.1-229.el7_6.9.noarch
> kernel - 3.10.0-957.5.1.el7.x86_64
> 
> 
> 2019-03-21 15:43:59.863+0000: 24555: warning :
> qemuDomainObjBeginJobInternal:6724 : Cannot start job (query, none, none)
> for domain vm_random_test_2117362559; current job is (query, none, none)
> owned by (24557 r
> emoteDispatchConnectGetAllDomainStats, 0 <null>, 0 <null> (flags=0x0)) for
> (76s, 0s, 0s)
> 2019-03-21 15:43:59.864+0000: 24555: error :
> qemuDomainObjBeginJobInternal:6746 : Timed out during operation: cannot
> acquire state change lock (held by
> monitor=remoteDispatchConnectGetAllDomainStats)
> 
> 
> 
> (gdb) t a a bt
> 

> Thread 16 (Thread 0x7eff36b1b700 (LWP 24557)):
> #0  0x00007eff44b7e965 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00007eff4778da96 in virCondWait () from /lib64/libvirt.so.0
> #2  0x00007eff2220738b in qemuMonitorSend () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #3  0x00007eff22218e50 in qemuMonitorJSONCommandWithFd () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #4  0x00007eff2221b200 in qemuMonitorJSONGetBalloonInfo () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #5  0x00007eff2221b375 in qemuMonitorJSONGetMemoryStats () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #6  0x00007eff2222cdff in qemuDomainMemoryStatsInternal () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #7  0x00007eff2222cf70 in qemuDomainGetStatsBalloon () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #8  0x00007eff2222d5a9 in qemuConnectGetAllDomainStats () from
> /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #9  0x00007eff4792dd96 in virConnectGetAllDomainStats () from
> /lib64/libvirt.so.0
> #10 0x000055b5255a7d60 in remoteDispatchConnectGetAllDomainStatsHelper ()
> #11 0x00007eff47850425 in virNetServerProgramDispatch () from
> /lib64/libvirt.so.0
> #12 0x00007eff47856bfd in virNetServerHandleJob () from /lib64/libvirt.so.0
> #13 0x00007eff4778e4a1 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #14 0x00007eff4778d828 in virThreadHelper () from /lib64/libvirt.so.0
> #15 0x00007eff44b7add5 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007eff448a3ead in clone () from /lib64/libc.so.6

Indeed, we are still waiting for qemu to reply. What state is qemu in? Isn't it perhaps in a D state? Otherwise I don't see what libvirt can do differently here. I mean, if it is asked to fetch data from monitor, and the domain doesn't reply, what should be the outcome if it is asked to do the same from another thread?

Comment 16 Elad 2019-03-24 13:51:24 UTC
qemu     30305 97.9  3.6 1906368 293948 ?      Sl   Mar21 4122:48 /usr/libexec/qemu-kvm -name guest=vm_random_test_2117362559,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-8-vm_random_test_21173/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 732b839f-dc3e-4970-98ef-5e0605c42e94 -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7.6-4.el7,serial=3d08d4e3-5874-4fab-9484-e34ce26dda01,uuid=732b839f-dc3e-4970-98ef-5e0605c42e94 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-03-21T15:40:54,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-ed30d678-f2c5-42a1-8db9-b019e465026f,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ua-074a393a-0eb9-454b-a8ab-979326b456a1,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-074a393a-0eb9-454b-a8ab-979326b456a1,id=ua-074a393a-0eb9-454b-a8ab-979326b456a1 -drive file=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/e0a7f865-426b-4fbe-a756-f3c1dd22c3d1,format=qcow2,if=none,id=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,serial=b46180fa-0ebd-45bd-89a8-a898317cd88e,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,id=ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,bootindex=1,write-cache=on -netdev tap,fd=37,id=hostua-677f9798-b874-47e4-a60e-fb9ea7c9e83c,vhost=on,vhostfd=38 -device virtio-net-pci,host_mtu=1500,netdev=hostua-677f9798-b874-47e4-a60e-fb9ea7c9e83c,id=ua-677f9798-b874-47e4-a60e-fb9ea7c9e83c,mac=00:1a:4a:16:25:8f,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=39,server,nowait -device virtserialport,bus=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=40,server,nowait -device virtserialport,bus=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=10.35.82.57,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-eb3f1412-56e4-497a-8025-5b11f0dbb2aa,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-304c1ae9-adab-4b78-8d51-188f74fa9f17,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-304c1ae9-adab-4b78-8d51-188f74fa9f17-codec0,bus=ua-304c1ae9-adab-4b78-8d51-188f74fa9f17.0,cad=0 -device virtio-balloon-pci,id=ua-0519504b-79d8-446b-9a9c-b8e984407445,bus=pci.0,addr=0x8 -object rng-random,id=objua-9e4c601a-5cd3-4f12-8be7-71f69879a18f,filename=/dev/urandom -device virtio-rng-pci,rng=objua-9e4c601a-5cd3-4f12-8be7-71f69879a18f,id=ua-9e4c601a-5cd3-4f12-8be7-71f69879a18f,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Comment 17 Tal Nisan 2019-03-25 09:11:28 UTC
Michal, can you share the info from qemu guys here? Better to have it visible here instead in a private email thread

Comment 18 Michal Privoznik 2019-03-25 10:40:40 UTC
(In reply to Tal Nisan from comment #17)
> Michal, can you share the info from qemu guys here? Better to have it
> visible here instead in a private email thread

There is no private e-mail conversation (or any other type of conversation) that I'm part of about this bug.

I suggest moving this bug to qemu to get their opinion. I mean, qemu-2.12.0 seems old enough for me to be lacking a possible fix.

Comment 21 John Ferlan 2019-04-02 23:24:48 UTC
From comment 16, I see :

-drive if=none,id=drive-ua-074a393a-0eb9-454b-a8ab-979326b456a1,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-074a393a-0eb9-454b-a8ab-979326b456a1,id=ua-074a393a-0eb9-454b-a8ab-979326b456a1 

-drive file=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/e0a7f865-426b-4fbe-a756-f3c1dd22c3d1,format=qcow2,if=none,id=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,serial=b46180fa-0ebd-45bd-89a8-a898317cd88e,werror=stop,rerror=stop,cache=none,aio=threads 

Of importance here is the second drive with the "_Storage__NFS_storage" in the path. I have a vague recollection of an issue related to NFS and stats where locking wasn't coordinated, but I don't have the exact bz to reference on it, nor can I come up with the correct magic words for a bz search.

Comment 22 Michal Privoznik 2019-04-05 08:12:45 UTC
(In reply to John Ferlan from comment #21)
> From comment 16, I see :
> 

> 
> Of importance here is the second drive with the "_Storage__NFS_storage" in
> the path. I have a vague recollection of an issue related to NFS and stats
> where locking wasn't coordinated, but I don't have the exact bz to reference
> on it, nor can I come up with the correct magic words for a bz search.

I had the same feeling, that's why I asked what state the qemu process is in (comment 15). If it was stale NFS then it would be in D state which would explain why qemu did not respond. But comment 16 shows it's in S state which is in contrast with stale NFS assumption. Hence my suggestion to investigate why qemu did not reply.

Comment 23 John Ferlan 2019-04-09 16:35:09 UTC
OK so digging deeper into the attached libvirtd.log from 21-Mar:

2019-03-21 15:40:53.417+0000: 24560: debug : virDomainDefineXML:6150 : conn=0x7eff28000ef0, xml=<?xml version='1.0' encoding='utf-8'?>
<domain xmlns:ns0="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
    <name>vm_random_test_2117362559</name>
    ...
        <controller index="0" model="virtio-scsi" type="scsi">
            <driver iothread="1" />
            <alias name="ua-ed30d678-f2c5-42a1-8db9-b019e465026f" />
        </controller>
    ...
        <disk device="cdrom" snapshot="no" type="file">
            <driver error_policy="report" name="qemu" type="raw" />
            <source file="" startupPolicy="optional">
                <seclabel model="dac" relabel="no" type="none" />
            </source>
            <target bus="ide" dev="hdc" />
            <readonly />
            <alias name="ua-074a393a-0eb9-454b-a8ab-979326b456a1" />
        </disk>
        <disk device="disk" snapshot="no" type="file">
            <target bus="virtio" dev="vda" />
            <source file="/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/e0a7f865-426b-4fbe-a756-f3c1dd22c3d1">
                <seclabel model="dac" relabel="no" type="none" />
            </source>
            <driver cache="none" error_policy="stop" io="threads" iothread="1" name="qemu" type="qcow2" />
            <alias name="ua-b46180fa-0ebd-45bd-89a8-a898317cd88e" />
            <boot order="1" />
            <serial>b46180fa-0ebd-45bd-89a8-a898317cd88e</serial>
        </disk>
...

2019-03-21 15:40:53.548+0000: 24556: debug : virDomainCreateWithFlags:6583 : dom=0x7eff240124b0, (VM: name=vm_random_test_2117362559, uuid=732b839f-dc3e-4970-98ef-5e0605c42e94), flags=0x0

...

2019-03-21 15:40:56.797+0000: 24556: debug : virCommandRunAsync:2476 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=vm_random_test_2117362559,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-8-vm_random_test_21173/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 732b839f-dc3e-4970-98ef-5e0605c42e94 -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7.6-4.el7,serial=3d08d4e3-5874-4fab-9484-e34ce26dda01,uuid=732b839f-dc3e-4970-98ef-5e0605c42e94' -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=33,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-03-21T15:40:54,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-ed30d678-f2c5-42a1-8db9-b019e465026f,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ua-074a393a-0eb9-454b-a8ab-979326b456a1,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-074a393a-0eb9-454b-a8ab-979326b456a1,id=ua-074a393a-0eb9-454b-a8ab-979326b456a1 -drive file=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/e0a7f865-426b-4fbe-a756-f3c1dd22c3d1,format=qcow2,if=none,id=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,serial=b46180fa-0ebd-45bd-89a8-a898317cd88e,werror=stop,rerror=stop,cache=none,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,id=ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,bootindex=1,write-cache=on -netdev tap,fd=37,id=hostua-677f9798-b874-47e4-a60e-fb9ea7c9e83c,vhost=on,vhostfd=38 -device virtio-net-pci,host_mtu=1500,netdev=hostua-677f9798-b874-47e4-a60e-fb9ea7c9e83c,id=ua-677f9798-b874-47e4-a60e-fb9ea7c9e83c,mac=00:1a:4a:16:25:8f,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=39,server,nowait -device virtserialport,bus=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=40,server,nowait -device virtserialport,bus=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -c
hardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-2739e377-ac98-4d0b-aff2-b41b81c38d0a.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5900,tls-port=5901,addr=10.35.82.57,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-eb3f1412-56e4-497a-8025-5b11f0dbb2aa,ram_size=67108864,vram_size=33554432,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-304c1ae9-adab-4b78-8d51-188f74fa9f17,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-304c1ae9-adab-4b78-8d51-188f74fa9f17-codec0,bus=ua-304c1ae9-adab-4b78-8d51-188f74fa9f17.0,cad=0 -device virtio-balloon-pci,id=ua-0519504b-79d8-446b-9a9c-b8e984407445,bus=pci.0,addr=0x8 -object rng-random,id=objua-9e4c601a-5cd3-4f12-8be7-71f69879a18f,filename=/dev/urandom -device virtio-rng-pci,rng=objua-9e4c601a-5cd3-4f12-8be7-71f69879a18f,id=ua-9e4c601a-5cd3-4f12-8be7-71f69879a18f,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
...
2019-03-21 15:40:56.830+0000: 24556: debug : qemuProcessLaunch:6397 : QEMU vm=0x7eff1001ec80 name=vm_random_test_2117362559 running with pid=30305
...

... What ends up being "cared about" eventually - the virtio-scsi device w/ IOThreads and the NFS volume:

-object iothread,id=iothread1
-device virtio-scsi-pci,iothread=iothread1,id=ua-ed30d678-f2c5-42a1-8db9-b019e465026f,bus=pci.0,addr=0x6
-device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,id=ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,bootindex=1,write-cache=on
-drive file=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/e0a7f865-426b-4fbe-a756-f3c1dd22c3d1,format=qcow2,if=none,id=drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e,serial=b46180fa-0ebd-45bd-89a8-a898317cd88e,werror=stop,rerror=stop,cache=none,aio=threads

...

...
2019-03-21 15:40:56.922+0000: 24556: debug : qemuProcessWaitForMonitor:2235 : Connect monitor to vm=0x7eff1001ec80 name='vm_random_test_2117362559' retry=0
...
2019-03-21 15:40:56.926+0000: 24556: debug : qemuDomainObjEnterMonitorInternal:7051 : Entering monitor (mon=0x7eff24035950 vm=0x7eff1001ec80 name=vm_random_test_2117362559)
...

... NB: The keys now being vm=0x7eff1001ec80 and mon=0x7eff24035950

... What follows is *all* the commands sent to the monitor:

2019-03-21 15:40:57.410+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qmp_capabilities","id":"libvirt-1"}^M
2019-03-21 15:40:57.414+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-migrate-capabilities","id":"libvirt-2"}^M
2019-03-21 15:40:57.419+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}^M
2019-03-21 15:40:57.426+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-chardev","id":"libvirt-4"}^M
2019-03-21 15:40:57.431+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-list","arguments":{"path":"/machine/unattached/device[0]"},"id":"libvirt-5"}^M
2019-03-21 15:40:57.442+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"feature-words"},"id":"libvirt-6"}^M
2019-03-21 15:40:57.449+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"filtered-features"},"id":"libvirt-7"}^M
2019-03-21 15:40:57.457+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-hotpluggable-cpus","id":"libvirt-8"}^M
2019-03-21 15:40:57.462+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-cpus-fast","id":"libvirt-9"}^M
2019-03-21 15:40:57.465+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-iothreads","id":"libvirt-10"}^M
2019-03-21 15:40:57.472+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"set_password","arguments":{"protocol":"spice","password":"*****","connected":"keep"},"id":"libvirt-11"}^M
2019-03-21 15:40:57.479+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"expire_password","arguments":{"protocol":"spice","time":"now"},"id":"libvirt-12"}^M
2019-03-21 15:40:57.485+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-13"}^M
2019-03-21 15:40:57.491+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445"},"id":"libvirt-14"}^M
2019-03-21 15:40:57.498+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-set","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats-polling-interval","value":5},"id":"libvirt-15"}^M
2019-03-21 15:40:57.508+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"balloon","arguments":{"value":1073741824},"id":"libvirt-16"}^M
2019-03-21 15:40:57.514+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-17"}^M
2019-03-21 15:40:57.520+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"cont","id":"libvirt-18"}^M
2019-03-21 15:40:58.422+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-19"}^M
2019-03-21 15:40:58.437+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-20"}^M
2019-03-21 15:40:58.460+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-eb3f1412-56e4-497a-8025-5b11f0dbb2aa","property":"vram_size"},"id":"libvirt-21"}^M
2019-03-21 15:40:58.469+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-eb3f1412-56e4-497a-8025-5b11f0dbb2aa","property":"ram_size"},"id":"libvirt-22"}^M
2019-03-21 15:40:58.483+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-eb3f1412-56e4-497a-8025-5b11f0dbb2aa","property":"vgamem_mb"},"id":"libvirt-23"}^M
2019-03-21 15:40:58.494+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-24"}^M
2019-03-21 15:40:58.503+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-25"}^M
2019-03-21 15:40:58.512+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-26"}^M
2019-03-21 15:40:58.524+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-27"}^M
2019-03-21 15:40:58.559+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-28"}^M
2019-03-21 15:40:58.623+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-29"}^M
2019-03-21 15:40:58.692+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-30"}^M
2019-03-21 15:41:13.059+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-31"}^M
2019-03-21 15:41:13.183+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-32"}^M
2019-03-21 15:41:13.248+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-33"}^M
2019-03-21 15:41:13.257+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-34"}^M
2019-03-21 15:41:13.270+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-35"}^M
2019-03-21 15:41:28.058+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-36"}^M
2019-03-21 15:41:28.067+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-37"}^M
2019-03-21 15:41:28.089+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-38"}^M
2019-03-21 15:41:28.093+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-39"}^M
2019-03-21 15:41:28.101+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-40"}^M
2019-03-21 15:41:29.319+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-41"}^M
2019-03-21 15:41:43.056+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-42"}^M
2019-03-21 15:41:43.068+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-43"}^M
2019-03-21 15:41:43.104+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-44"}^M
2019-03-21 15:41:43.109+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-45"}^M
2019-03-21 15:41:43.113+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-46"}^M
2019-03-21 15:41:58.053+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-47"}^M
2019-03-21 15:41:58.058+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-48"}^M
2019-03-21 15:41:58.077+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-49"}^M
2019-03-21 15:41:58.081+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-50"}^M
2019-03-21 15:41:58.086+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-51"}^M
2019-03-21 15:42:13.060+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-52"}^M
2019-03-21 15:42:13.065+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-53"}^M
2019-03-21 15:42:13.100+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-54"}^M
2019-03-21 15:42:13.104+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-55"}^M
2019-03-21 15:42:13.108+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-56"}^M
2019-03-21 15:42:17.924+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"transaction","arguments":{"actions":[{"type":"blockdev-snapshot-sync","data":{"device":"drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e","snapshot-file":"/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5","format":"qcow2","mode":"existing"}}]},"id":"libvirt-57"}^M
2019-03-21 15:42:18.085+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-58"}^M
2019-03-21 15:42:18.091+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-59"}^M
2019-03-21 15:42:18.113+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"block-set-write-threshold","arguments":{"node-name":"#block072","write-threshold":0},"id":"libvirt-60"}^M
2019-03-21 15:42:28.058+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-61"}^M
2019-03-21 15:42:28.064+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"qom-get","arguments":{"path":"/machine/peripheral/ua-0519504b-79d8-446b-9a9c-b8e984407445","property":"guest-stats"},"id":"libvirt-62"}^M
2019-03-21 15:42:28.075+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-blockstats","id":"libvirt-63"}^M
2019-03-21 15:42:28.079+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-block","id":"libvirt-64"}^M
2019-03-21 15:42:28.085+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-named-block-nodes","id":"libvirt-65"}^M
2019-03-21 15:42:28.960+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"drive-mirror","arguments":{"device":"drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e","target":"/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/d093be4a-c17d-47bd-a7b3-2c61bb55cbd8/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5","sync":"top","mode":"existing","format":"qcow2"},"id":"libvirt-66"}^M
2019-03-21 15:42:43.060+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-67"}^M
...


Every 15 seconds or so there's a sequence of query-balloon, qom-get, query-blockstats, query-block, query-named-block-nodes (standard/normal query I believe)

At 15:42:28.058 we get one, followed by the "drive-mirror" at 15:42:28.960. Then at 15:42:43.060 the next query-balloon starts
but never returns. There's also a "blockdev-snapshot-sync" at 15:42:17.924 which looks interesting in the grand scheme of things.

... Prior to the drive-mirror there's:

2019-03-21 15:42:28.809+0000: 24557: debug : virDomainBlockCopy:10274 : dom=0x7eff04029850, (VM: name=vm_random_test_2117362559, uuid=732b839f-dc3e-4970-98ef-5e0605c42e94), disk=vda, destxml=<?xml version='1.0' encoding='utf-8'?>
<disk device="disk" snapshot="no" type="file"><source file="/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/d093be4a-c17d-47bd-a7b3-2c61bb55cbd8/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5"><seclabel model="dac" relabel="no" type="none" /></source><driver cache="none" error_policy="stop" io="threads" name="qemu" type="qcow2" /></disk>, params=0x7eff0401b450, nparams=0, flags=0x7
2019-03-21 15:42:28.810+0000: 24557: debug : virStorageFileBackendFileInit:75 : initializing FS storage file 0x7eff0401b600 (file:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/d093be4a-c17d-47bd-a7b3-2c61bb55cbd8/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5)[107:107]
2019-03-21 15:42:28.938+0000: 24557: debug : virDomainLockManagerAddImage:90 : Add disk /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/d093be4a-c17d-47bd-a7b3-2c61bb55cbd8/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5
2019-03-21 15:42:28.938+0000: 24557: debug : virLockManagerAddResource:332 : lock=0x7eff040192c0 type=0 name=/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/d093be4a-c17d-47bd-a7b3-2c61bb55cbd8/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5 nparams=0 params=(nil) flags=0x0
2019-03-21 15:42:28.938+0000: 24557: debug : virLockManagerAcquire:350 : lock=0x7eff040192c0 state='<null>' flags=0x0 action=0 fd=(nil)
2019-03-21 15:42:28.938+0000: 24557: debug : virLockManagerSanlockAcquire:1033 : Acquiring object 0
2019-03-21 15:42:28.938+0000: 24557: debug : virLockManagerSanlockAcquire:1079 : Acquire completed fd=-1

... After the drive-mirror there's:

2019-03-21 15:42:29.015+0000: 24557: debug : virStorageFileBackendFileDeinit:58 : deinitializing FS storage file 0x7eff0401b600 (file:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__2/d093be4a-c17d-47bd-a7b3-2c61bb55cbd8/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5)
2019-03-21 15:42:29.015+0000: 24557: debug : virFileMakePathHelper:3093 : path=/var/run/libvirt/qemu mode=0777
2019-03-21 15:42:29.016+0000: 24557: debug : virFileClose:111 : Closed fd 29
2019-03-21 15:42:29.016+0000: 24557: debug : qemuDomainObjEndJob:6935 : Stopping job: modify (async=none vm=0x7eff1001ec80 name=vm_random_test_2117362559)

... And what appear to be events from the mirror:

2019-03-21 15:42:29.024+0000: 24554: debug : qemuProcessHandleEvent:530 : vm=0x7eff1001ec80
2019-03-21 15:42:29.024+0000: 24554: debug : qemuProcessHandleBlockJob:912 : Block job for device drive-ua-b46180fa-0ebd-45bd-89a8-a898317cd88e (domain: 0x7eff1001ec80,vm_random_test_2117362559) type 2 status 3

...

2019-03-21 15:42:29.024+0000: 26685: debug : qemuProcessEventHandler:4797 : vm=0x7eff1001ec80, event=5
2019-03-21 15:42:29.024+0000: 26685: debug : qemuDomainObjBeginJobInternal:6610 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7eff1001ec80 name=vm_random_test_2117362559, current job=none agentJob=none async=none)
2019-03-21 15:42:29.024+0000: 26685: debug : qemuDomainObjBeginJobInternal:6659 : Started job: modify (async=none vm=0x7eff1001ec80 name=vm_random_test_2117362559)
2019-03-21 15:42:29.024+0000: 26685: debug : qemuBlockJobEventProcess:115 : disk=vda, mirrorState=none, type=2, status=3
2019-03-21 15:42:29.025+0000: 26685: debug : qemuDomainObjEndJob:6935 : Stopping job: modify (async=none vm=0x7eff1001ec80 name=vm_random_test_2117362559)

FWIW: type=2 is VIR_DOMAIN_BLOCK_JOB_TYPE_COPY and status=3 is VIR_DOMAIN_BLOCK_JOB_READY, so that seems normal

...

2019-03-21 15:42:29.026+0000: 24554: debug : remoteRelayDomainEventBlockJob:652 : Relaying domain block job event vm_random_test_2117362559 8 /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Storage__NFS_storage__local__ge2__nfs__0/71d16523-1ae9-493a-a84c-d5aa80beb74c/images/b46180fa-0ebd-45bd-89a8-a898317cd88e/4b3a55c4-f18c-40f3-b023-8df86e71d1e5 2, 3, callback 15

... When the next query-balloon starts in the next 15 second cycle from the last one ...

2019-03-21 15:42:43.060+0000: 24557: debug : qemuMonitorGetMemoryStats:2132 : stats=0x7eff36b1a830 nstats=10
2019-03-21 15:42:43.060+0000: 24557: debug : qemuMonitorGetMemoryStats:2134 : mon:0x7eff24035950 vm:0x7eff1001ec80 json:1 fd:33
2019-03-21 15:42:43.060+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-67"}^M

... nothing happens until a future query notes the failure:

2019-03-21 15:42:58.590+0000: 24559: debug : qemuDomainObjBeginJobInternal:6610 : Starting job: job=query agentJob=none asyncJob=none (vm=0x7eff1001ec80 name=vm_random_test_2117362559, current job=query agentJob=none async=none)
2019-03-21 15:42:58.590+0000: 24559: debug : qemuDomainObjBeginJobInternal:6640 : Waiting for job (vm=0x7eff1001ec80 name=vm_random_test_2117362559)
...
2019-03-21 15:43:28.590+0000: 24559: warning : qemuDomainObjBeginJobInternal:6724 : Cannot start job (query, none, none) for domain vm_random_test_2117362559; current job is (query, none, none) owned by (24557 remoteDispatchConnectGetAllDomainStats, 0 <null>, 0 <null> (flags=0x0)) for (45s, 0s, 0s)

... and even later...

2019-03-21 15:43:59.863+0000: 24555: warning : qemuDomainObjBeginJobInternal:6724 : Cannot start job (query, none, none) for domain vm_random_test_2117362559; current job is (query, none, none) owned by (24557 remoteDispatchConnectGetAllDomainStats, 0 <null>, 0 <null> (flags=0x0)) for (76s, 0s, 0s)

... Maybe this analysis helps trigger a memory for someone of seeing this type of behavior before at least from the viewpoint of what is normal/expected behaviour.

Seems strange that the next cycle of query-balloon, qom-get, query-blockstats, query-block, query-named-block-nodes would fail/hang after the "drive-mirror" via BlockCopy - doesn't seem to be anything odd about that but what's expected processing for/after BlockCopy is something hopefully Peter can provide feedback on.  

I agree with Michal's POV that from a libvirt perspective of just processing some next request things seem fine, what's not clear is whether all expected BlockCopy processing in the way of QEMU events has occurred.

Peter - do the events seen after the BlockCopy seem normal? Are/were more expected? IOW: Could the guest be blocked because of the processing copy? The CPU %age of 97.9 seems a bit high if nothing were going on...  Maybe what's here triggers some other thoughts towards a known problem.

Comment 25 John Ferlan 2019-04-15 17:01:12 UTC
Similar to the "t a a bt" (thread apply all backtrace) Michal suggested for libvirtd in comment 13, can you provide the same for the qemu guest process as it seems libvirtd would be waiting for something to happen. In the output above that would be for pid 24556...  If it's not reproducible any more, could be difficult to diagnose any further.

Comment 26 Elad 2019-04-16 08:41:46 UTC
[root@storage-ge2-vdsm1 ~]# sudo gdb -p $(pgrep qemu)
(gdb) t a a bt
Thread 1 (Thread 0x7f4b11859740 (LWP 3133)):
#0  0x00007f4b102111f0 in __poll_nocancel () from /lib64/libc.so.6
#1  0x00007f4b1117a32c in g_main_context_iterate.isra.19 () from /lib64/libglib-2.0.so.0
#2  0x00007f4b1117a67a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#3  0x00005587cd427ab1 in main ()


[root@storage-ge2-vdsm1 ~]# ps -ef |grep qemu
root      3133     1  0 Apr15 ?        00:00:10 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook
root      4219 32021  0 11:41 pts/0    00:00:00 grep --color=auto qemu
qemu     28602     1 13 11:02 ?        00:05:18 /usr/libexec/qemu-kvm -name guest=golden_env_mixed_virtio_2_0,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-golden_env_mixed_vir/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 2eda0fd5-a28c-4987-a054-9b72a10c9b21 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.6-4.el7,serial=3406735e-57a3-4e51-87e5-7ebb275505ee,uuid=2eda0fd5-a28c-4987-a054-9b72a10c9b21 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=32,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-04-16T08:02:54,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-8b998f9e-1097-4170-b314-4b91e6de35f8,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=ua-4a554519-5da3-4cd7-a258-d377645299d2,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ua-27bbad81-ce14-4284-8617-f0c4151c5150,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-27bbad81-ce14-4284-8617-f0c4151c5150,id=ua-27bbad81-ce14-4284-8617-f0c4151c5150 -drive file=/rhev/data-center/mnt/blockSD/329acbbc-e565-4dd7-af64-c4350943979a/images/1ef51db0-c01e-4f64-81ab-105e6d8225d0/2289c9ae-c50b-4417-a5e4-d7b1e3d2d2b7,format=qcow2,if=none,id=drive-ua-1ef51db0-c01e-4f64-81ab-105e6d8225d0,serial=1ef51db0-c01e-4f64-81ab-105e6d8225d0,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-1ef51db0-c01e-4f64-81ab-105e6d8225d0,id=ua-1ef51db0-c01e-4f64-81ab-105e6d8225d0,bootindex=1,write-cache=on -netdev tap,fd=34,id=hostua-b3f490cc-1631-4e3f-96e2-293adb2521db,vhost=on,vhostfd=35 -device virtio-net-pci,host_mtu=1500,netdev=hostua-b3f490cc-1631-4e3f-96e2-293adb2521db,id=ua-b3f490cc-1631-4e3f-96e2-293adb2521db,mac=00:1a:4a:16:25:85,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=36,server,nowait -device virtserialport,bus=ua-4a554519-5da3-4cd7-a258-d377645299d2.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=37,server,nowait -device virtserialport,bus=ua-4a554519-5da3-4cd7-a258-d377645299d2.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-4a554519-5da3-4cd7-a258-d377645299d2.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 10.35.82.55:0,password,tls,x509=/etc/pki/vdsm/libvirt-vnc -k en-us -spice port=5901,tls-port=5902,addr=10.35.82.55,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-4907105f-f406-4fcd-b451-3c3f387824cc,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-1a46e4dd-4caf-460e-98f0-54fc389711eb,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-1a46e4dd-4caf-460e-98f0-54fc389711eb-codec0,bus=ua-1a46e4dd-4caf-460e-98f0-54fc389711eb.0,cad=0 -device virtio-balloon-pci,id=ua-9706d573-6348-4aa0-820c-f7fc5e340387,bus=pci.0,addr=0x8 -object rng-random,id=objua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,filename=/dev/urandom -device virtio-rng-pci,rng=objua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,id=ua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
qemu     29868     1 15 11:07 ?        00:05:21 /usr/libexec/qemu-kvm -name guest=golden_env_mixed_virtio_2_1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-golden_env_mixed_vir/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid c5b441bf-2b18-4c7c-8207-be2ce798f4c3 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.6-4.el7,serial=3406735e-57a3-4e51-87e5-7ebb275505ee,uuid=c5b441bf-2b18-4c7c-8207-be2ce798f4c3 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=34,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-04-16T08:07:01,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-9ed4a259-8936-4a22-a284-ee4a314c22a3,bus=pci.0,addr=0x5 -device virtio-serial-pci,id=ua-bd4d32ad-8aab-4c34-af1c-5e7875ce09c1,max_ports=16,bus=pci.0,addr=0x6 -drive if=none,id=drive-ua-c724bdcb-8acb-49b1-a795-7d72315355a4,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-c724bdcb-8acb-49b1-a795-7d72315355a4,id=ua-c724bdcb-8acb-49b1-a795-7d72315355a4 -drive file=/rhev/data-center/mnt/blockSD/329acbbc-e565-4dd7-af64-c4350943979a/images/d9fd4252-bc05-4af1-b8a3-22411924b883/c7d50324-8765-4567-883d-ee105655c3d0,format=qcow2,if=none,id=drive-ua-d9fd4252-bc05-4af1-b8a3-22411924b883,serial=d9fd4252-bc05-4af1-b8a3-22411924b883,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-d9fd4252-bc05-4af1-b8a3-22411924b883,id=ua-d9fd4252-bc05-4af1-b8a3-22411924b883,bootindex=1,write-cache=on -netdev tap,fd=36,id=hostua-8888bc0c-233c-4944-8cb7-fd4a4590115d,vhost=on,vhostfd=37 -device virtio-net-pci,host_mtu=1500,netdev=hostua-8888bc0c-233c-4944-8cb7-fd4a4590115d,id=ua-8888bc0c-233c-4944-8cb7-fd4a4590115d,mac=00:1a:4a:16:25:86,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=38,server,nowait -device virtserialport,bus=ua-bd4d32ad-8aab-4c34-af1c-5e7875ce09c1.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=39,server,nowait -device virtserialport,bus=ua-bd4d32ad-8aab-4c34-af1c-5e7875ce09c1.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-bd4d32ad-8aab-4c34-af1c-5e7875ce09c1.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 10.35.82.55:3,password,tls,x509=/etc/pki/vdsm/libvirt-vnc -k en-us -spice port=5904,tls-port=5905,addr=10.35.82.55,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-63183cf0-9a95-42cc-af48-aaa09d21d10b,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-7db0bb8b-ca10-4e09-ad30-3dabe384ee3f,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-7db0bb8b-ca10-4e09-ad30-3dabe384ee3f-codec0,bus=ua-7db0bb8b-ca10-4e09-ad30-3dabe384ee3f.0,cad=0 -device virtio-balloon-pci,id=ua-24747aae-e0b0-4bd5-bcc1-1329f6b00063,bus=pci.0,addr=0x8 -object rng-random,id=objua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,filename=/dev/urandom -device virtio-rng-pci,rng=objua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,id=ua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
qemu     30472     1 96 11:09 ?        00:31:10 /usr/libexec/qemu-kvm -name guest=golden_env_mixed_virtio_3_0,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-golden_env_mixed_vir/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 226e0c41-c7da-4c90-8502-e21d61e832f9 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.6-4.el7,serial=3406735e-57a3-4e51-87e5-7ebb275505ee,uuid=226e0c41-c7da-4c90-8502-e21d61e832f9 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=35,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2019-04-16T08:09:03,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,iothread=iothread1,id=ua-c2137297-ae8a-4800-9179-778d9ab87168,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=ua-77607cb6-e17f-434a-95ee-1ffbae896a6f,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ua-e7d1a2e7-0c71-4aa8-a81b-ea48c4816e86,werror=report,rerror=report,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-e7d1a2e7-0c71-4aa8-a81b-ea48c4816e86,id=ua-e7d1a2e7-0c71-4aa8-a81b-ea48c4816e86 -drive file=/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_storage__local__ge2__volume__0/1c45bf41-5d54-4074-aaf6-58cffb795e12/images/a5a67085-637c-4213-a52d-b7e28ecd6031/230db827-5260-4830-af63-dacddf9de888,format=qcow2,if=none,id=drive-ua-a5a67085-637c-4213-a52d-b7e28ecd6031,serial=a5a67085-637c-4213-a52d-b7e28ecd6031,werror=stop,rerror=stop,cache=none,aio=native -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x7,drive=drive-ua-a5a67085-637c-4213-a52d-b7e28ecd6031,id=ua-a5a67085-637c-4213-a52d-b7e28ecd6031,bootindex=1,write-cache=on -netdev tap,fd=37,id=hostua-b707b63c-8e8a-42e7-9729-60fe58bdc2ef,vhost=on,vhostfd=39 -device virtio-net-pci,host_mtu=1500,netdev=hostua-b707b63c-8e8a-42e7-9729-60fe58bdc2ef,id=ua-b707b63c-8e8a-42e7-9729-60fe58bdc2ef,mac=00:1a:4a:16:25:87,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,fd=40,server,nowait -device virtserialport,bus=ua-77607cb6-e17f-434a-95ee-1ffbae896a6f.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,fd=41,server,nowait -device virtserialport,bus=ua-77607cb6-e17f-434a-95ee-1ffbae896a6f.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-77607cb6-e17f-434a-95ee-1ffbae896a6f.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -vnc 10.35.82.55:6,password,tls,x509=/etc/pki/vdsm/libvirt-vnc -k en-us -spice port=5907,tls-port=5908,addr=10.35.82.55,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-cd2b012e-71bf-48d9-885e-8abf6fcbfc42,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=ua-6cca19c8-e671-4b44-b327-f5c50285cc8c,bus=pci.0,addr=0x4 -device hda-duplex,id=ua-6cca19c8-e671-4b44-b327-f5c50285cc8c-codec0,bus=ua-6cca19c8-e671-4b44-b327-f5c50285cc8c.0,cad=0 -device virtio-balloon-pci,id=ua-465fb62a-3dba-494a-80d6-785756daf16b,bus=pci.0,addr=0x8 -object rng-random,id=objua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,filename=/dev/urandom -device virtio-rng-pci,rng=objua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,id=ua-daba834c-1b0b-48f7-9cf7-7e6a4140d346,bus=pci.0,addr=0x9 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Comment 27 John Ferlan 2019-04-16 11:24:55 UTC
Restoring needsinfo? on Peter for comment 23 - gotta be careful of that box that says "clear all"

Comment 28 John Ferlan 2019-04-16 11:30:17 UTC
And restoring the needinfo? to get the right process as your :

sudo gdb -p $(pgrep qemu) 

picked "Thread 1 (Thread 0x7f4b11859740 (LWP 3133)):"

which is not the guest process with the issue:

root@storage-ge2-vdsm1 ~]# ps -ef |grep qemu
root      3133     1  0 Apr15 ?        00:00:10 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook
root      4219 32021  0 11:41 pts/0    00:00:00 grep --color=auto qemu
qemu     28602     1 13 11:02 ?        00:05:18 /usr/libexec/qemu-kvm -name guest=golden_env_mixed_virtio_2_0,debug-threads=on


The 3133 is just a qemu-ga, someone will need to see the qemu-kvm process - 28602 (I assume that's the one that's doing something that caused libvirt to wait).

Please be care not to click the "I am providing..." as it will "clear all needinfo requests"....  Instead scroll-up and remove the needinfo? next to your name in the flags section (choose the empty element from the dropdown menu).

Comment 30 Peter Krempa 2019-06-05 06:10:21 UTC
So according to the log, indeed the last command libvirt issued was:

2019-03-21 15:42:43.060+0000: 24554: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7eff24035950 buf={"execute":"query-balloon","id":"libvirt-67"}
and never got a response.

Since we must treat the monitor as synchronous the libvirt API can't return in this case until qemu responds.

This usually happens if qemu gets stuck in untinterruptible sleep due to broken storage, but unfortunately the 'ps' output does not include the STAT field by default so I can't tell. It could also be caused by a deadlock in qemu, but for that we'd need the backtrace of qemu.

In general the proximity of the block copy (drive-mirror) did not directly cause this in libvirt, but obviously the new storage accessed might run into a problem.

In this case I'd look for storage issues or perhaps a problem in querying the balloon driver.

Comment 32 aihua liang 2019-07-03 09:03:40 UTC
Can reproduce this issue on qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64, the key point is: virtio_blk + data_plane(enabled). 

Test Env:
  qemu-kvm-rhev version: qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
  kernel version:3.10.0-957.12.1.el7.x86_64

Test steps:
  1.Create an empty image.
    #qemu-img create -f qcow2 mirror.qcow2 20G
  
  2.Start dst vm with mirror.qcow2(virtio-blk-pci, data_plane enable)
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190624-022456-Uc0cnuxt,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190624-022456-Uc0cnuxs,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idOSU1la  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190624-022456-Uc0cnuxs,path=/var/tmp/seabios-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190624-022456-Uc0cnuxs,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/mirror.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:01:02:03:04:05,id=idk78PtS,vectors=4,netdev=idd7g5ik,bus=pci.0,addr=0x5  \
    -netdev tap,id=idd7g5ik,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'Westmere',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \
    -incoming tcp:0:5000 \

  3.Start src guest with virtio_blk device, data_plane enable.
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190624-022456-Uc0cnuxs,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190624-022456-Uc0cnuxs,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idOSU1la  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190624-022456-Uc0cnuxs,path=/var/tmp/seabios-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190624-022456-Uc0cnuxs,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/kvm_autotest_root/images/rhel76-64-virtio-scsi.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:01:02:03:04:05,id=idk78PtS,vectors=4,netdev=idd7g5ik,bus=pci.0,addr=0x5  \
    -netdev tap,id=idd7g5ik,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'Westmere',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \

  4.Do mirror from src to dst.
     { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target":"/home/mirror.qcow2","sync":"full","mode":"existing"}}

  5.Set migration capabilities.
     {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}

  6.Migrate from src to dst.
     { "execute": "migrate", "arguments": { "uri": "tcp:localhost:5000"}}
{"return": {}}

  7.Wait some minutes, check migrate status
     {"execute":"qury-migrate"}

Actual Result:
  After step7, no response from src qemu.

Pstack info as bellow:
  #pstack 68165
Thread 10 (Thread 0x7f2705804700 (LWP 68166)):
#0  0x00007f270cfcf1c9 in syscall () at /lib64/libc.so.6
#1  0x0000558f64835410 in qemu_event_wait (val=<optimized out>, f=<optimized out>) at /usr/src/debug/qemu-2.12.0/include/qemu/futex.h:29
#2  0x0000558f64835410 in qemu_event_wait (ev=ev@entry=0x558f654bcbe8 <rcu_call_ready_event>) at util/qemu-thread-posix.c:445
#3  0x0000558f6484593e in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261
#4  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 9 (Thread 0x7f2705003700 (LWP 68167)):
#0  0x00007f270d2b24ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f270d2adde6 in _L_lock_941 () at /lib64/libpthread.so.0
#2  0x00007f270d2adcdf in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x0000558f64834d39 in qemu_mutex_lock_impl (mutex=mutex@entry=0x558f67533960, file=file@entry=0x558f649c87df "util/async.c", line=line@entry=502) at util/qemu-thread-posix.c:67
#4  0x0000558f6482fdfc in co_schedule_bh_cb (ctx=0x558f67533900) at util/async.c:502
#5  0x0000558f6482fdfc in co_schedule_bh_cb (opaque=0x558f67533900) at util/async.c:390
#6  0x0000558f6482ff71 in aio_bh_poll (bh=0x558f675a1ec0) at util/async.c:90
#7  0x0000558f6482ff71 in aio_bh_poll (ctx=ctx@entry=0x558f67533900) at util/async.c:118
#8  0x0000558f648331fb in aio_poll (ctx=0x558f67533900, blocking=blocking@entry=true) at util/aio-posix.c:704
#9  0x0000558f64602d5e in iothread_run (opaque=0x558f67551ce0) at iothread.c:64
#10 0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#11 0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 8 (Thread 0x7f2703f00700 (LWP 68185)):
#0  0x00007f270cfca2cf in ppoll () at /lib64/libc.so.6
#1  0x0000558f6483140b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x0000558f6483140b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
#3  0x0000558f64833187 in aio_poll (ctx=0x558f67533cc0, blocking=blocking@entry=true) at util/aio-posix.c:645
#4  0x0000558f64602d5e in iothread_run (opaque=0x558f67551ea0) at iothread.c:64
#5  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 7 (Thread 0x7f27036ff700 (LWP 68186)):
#0  0x00007f270d2b24ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f270d2b4cdc in _L_cond_lock_847 () at /lib64/libpthread.so.0
#2  0x00007f270d2b4b71 in __pthread_mutex_cond_lock () at /lib64/libpthread.so.0
#3  0x00007f270d2af9f5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#4  0x0000558f64834fe9 in qemu_cond_wait_impl (cond=<optimized out>, mutex=mutex@entry=0x558f650830e0 <qemu_global_mutex>, file=file@entry=0x558f648c9308 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1176) at util/qemu-thread-posix.c:164
#5  0x0000558f64516d1f in qemu_wait_io_event (cpu=cpu@entry=0x558f67786000) at /usr/src/debug/qemu-2.12.0/cpus.c:1176
#6  0x0000558f64518420 in qemu_kvm_cpu_thread_fn (arg=0x558f67786000) at /usr/src/debug/qemu-2.12.0/cpus.c:1220
#7  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#8  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 6 (Thread 0x7f2702efe700 (LWP 68193)):
#0  0x00007f270d2b24ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f270d2b4cdc in _L_cond_lock_847 () at /lib64/libpthread.so.0
#2  0x00007f270d2b4b71 in __pthread_mutex_cond_lock () at /lib64/libpthread.so.0
#3  0x00007f270d2af9f5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#4  0x0000558f64834fe9 in qemu_cond_wait_impl (cond=<optimized out>, mutex=mutex@entry=0x558f650830e0 <qemu_global_mutex>, file=file@entry=0x558f648c9308 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1176) at util/qemu-thread-posix.c:164
#5  0x0000558f64516d1f in qemu_wait_io_event (cpu=cpu@entry=0x558f677e4000) at /usr/src/debug/qemu-2.12.0/cpus.c:1176
#6  0x0000558f64518420 in qemu_kvm_cpu_thread_fn (arg=0x558f677e4000) at /usr/src/debug/qemu-2.12.0/cpus.c:1220
#7  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#8  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 5 (Thread 0x7f27026fd700 (LWP 68196)):
#0  0x00007f270d2b24ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f270d2b4cdc in _L_cond_lock_847 () at /lib64/libpthread.so.0
#2  0x00007f270d2b4b71 in __pthread_mutex_cond_lock () at /lib64/libpthread.so.0
#3  0x00007f270d2af9f5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#4  0x0000558f64834fe9 in qemu_cond_wait_impl (cond=<optimized out>, mutex=mutex@entry=0x558f650830e0 <qemu_global_mutex>, file=file@entry=0x558f648c9308 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1176) at util/qemu-thread-posix.c:164
#5  0x0000558f64516d1f in qemu_wait_io_event (cpu=cpu@entry=0x558f67804000) at /usr/src/debug/qemu-2.12.0/cpus.c:1176
#6  0x0000558f64518420 in qemu_kvm_cpu_thread_fn (arg=0x558f67804000) at /usr/src/debug/qemu-2.12.0/cpus.c:1220
#7  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#8  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 4 (Thread 0x7f2701efc700 (LWP 68197)):
#0  0x00007f270d2b24ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f270d2b4cdc in _L_cond_lock_847 () at /lib64/libpthread.so.0
#2  0x00007f270d2b4b71 in __pthread_mutex_cond_lock () at /lib64/libpthread.so.0
#3  0x00007f270d2af9f5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#4  0x0000558f64834fe9 in qemu_cond_wait_impl (cond=<optimized out>, mutex=mutex@entry=0x558f650830e0 <qemu_global_mutex>, file=file@entry=0x558f648c9308 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1176) at util/qemu-thread-posix.c:164
#5  0x0000558f64516d1f in qemu_wait_io_event (cpu=cpu@entry=0x558f67824000) at /usr/src/debug/qemu-2.12.0/cpus.c:1176
#6  0x0000558f64518420 in qemu_kvm_cpu_thread_fn (arg=0x558f67824000) at /usr/src/debug/qemu-2.12.0/cpus.c:1220
#7  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#8  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 3 (Thread 0x7f253fbff700 (LWP 68233)):
#0  0x00007f270d2af965 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000558f64834fe9 in qemu_cond_wait_impl (cond=cond@entry=0x558f674fcab0, mutex=mutex@entry=0x558f674fcae8, file=file@entry=0x558f6499ed07 "ui/vnc-jobs.c", line=line@entry=212) at util/qemu-thread-posix.c:164
#2  0x0000558f6474fb1f in vnc_worker_thread_loop (queue=queue@entry=0x558f674fcab0) at ui/vnc-jobs.c:212
#3  0x0000558f647500e8 in vnc_worker_thread (arg=0x558f674fcab0) at ui/vnc-jobs.c:319
#4  0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7f2521177700 (LWP 68484)):
#0  0x0000558f64846ec0 in qemu_in_coroutine () at util/coroutine-ucontext.c:240
#1  0x0000558f647ab071 in bdrv_do_drained_begin_quiesce (bs=0x558f67949400, parent=0x0, ignore_bds_parents=<optimized out>) at block/io.c:352
#2  0x0000558f647ab11b in bdrv_do_drained_begin (bs=0x558f67949400, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:375
#3  0x0000558f6479dd78 in blk_drain (blk=0x558f67607600) at block/block-backend.c:1607
#4  0x0000558f64760029 in block_job_drain (job=0x558f676078c0) at blockjob.c:123
#5  0x0000558f6475ffa8 in block_job_detach_aio_context (opaque=0x558f676078c0) at blockjob.c:139
#6  0x0000558f6475c341 in bdrv_detach_aio_context (bs=bs@entry=0x558f67949400) at block.c:4932
#7  0x0000558f6475c4e6 in bdrv_set_aio_context (bs=0x558f67949400, new_context=0x558f675337c0) at block.c:4993
#8  0x0000558f6479e2b8 in blk_set_aio_context (blk=<optimized out>, new_context=<optimized out>) at block/block-backend.c:1933
#9  0x0000558f6454ddd8 in virtio_blk_data_plane_stop (vdev=<optimized out>) at /usr/src/debug/qemu-2.12.0/hw/block/dataplane/virtio-blk.c:286
#10 0x0000558f646f39d5 in virtio_bus_stop_ioeventfd (bus=0x558f692240f8) at hw/virtio/virtio-bus.c:246
#11 0x0000558f64578497 in virtio_vmstate_change (opaque=0x558f69224170, running=<optimized out>, state=<optimized out>) at /usr/src/debug/qemu-2.12.0/hw/virtio/virtio.c:2237
#12 0x0000558f64608c92 in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at vl.c:1642
#13 0x0000558f64518e5a in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE, send_stop=send_stop@entry=true) at /usr/src/debug/qemu-2.12.0/cpus.c:1012
#14 0x0000558f645192ed in vm_stop (state=<optimized out>) at /usr/src/debug/qemu-2.12.0/cpus.c:2035
#15 0x0000558f64519406 in vm_stop_force_state (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.12.0/cpus.c:2085
#16 0x0000558f646f8c97 in migration_thread (s=0x558f67552280) at migration/migration.c:2137
#17 0x0000558f646f8c97 in migration_thread (s=0x558f67552280) at migration/migration.c:2312
#18 0x0000558f646f8c97 in migration_thread (opaque=0x558f67552280) at migration/migration.c:2416
#19 0x00007f270d2abdd5 in start_thread () at /lib64/libpthread.so.0
#20 0x00007f270cfd4ead in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f27266bcdc0 (LWP 68165)):
#0  0x00007f270d2b24ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f270d2addcb in _L_lock_883 () at /lib64/libpthread.so.0
#2  0x00007f270d2adc98 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x0000558f64834d39 in qemu_mutex_lock_impl (mutex=mutex@entry=0x558f650830e0 <qemu_global_mutex>, file=file@entry=0x558f648c9308 "/builddir/build/BUILD/qemu-2.12.0/cpus.c", line=line@entry=1765) at util/qemu-thread-posix.c:67
#4  0x0000558f64518298 in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.12.0/cpus.c:1765
#5  0x0000558f648322da in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:236
#6  0x0000558f648322da in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
#7  0x0000558f644d4a87 in main () at vl.c:1963
#8  0x0000558f644d4a87 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4768

Comment 33 aihua liang 2019-07-04 02:49:47 UTC
Test on qemu-kvm-rhev-2.12.0-33.el7.x86_64, also hit this issue.

Comment 41 aihua liang 2019-09-25 05:46:56 UTC
Test on qemu-kvm-rhev-2.12.0-36.el7, don't hit this issue any more, will set its status to "Verified" after "ON_QA" set.

 Test Steps:
   1.Create an empty image.
    #qemu-img create -f qcow2 mirror.qcow2 20G
  
  2.Start dst vm with mirror.qcow2(virtio-blk-pci, data_plane enable)
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190624-022456-Uc0cnuxt,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190624-022456-Uc0cnuxs,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idOSU1la  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190624-022456-Uc0cnuxs,path=/var/tmp/seabios-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190624-022456-Uc0cnuxs,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/mirror.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:01:02:03:04:05,id=idk78PtS,vectors=4,netdev=idd7g5ik,bus=pci.0,addr=0x5  \
    -netdev tap,id=idd7g5ik,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'Westmere',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \
    -incoming tcp:0:5000 \

  3.Start src guest with virtio_blk device, data_plane enable.
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20190624-022456-Uc0cnuxs,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190624-022456-Uc0cnuxs,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idOSU1la  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190624-022456-Uc0cnuxs,path=/var/tmp/seabios-20190624-022456-Uc0cnuxs,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190624-022456-Uc0cnuxs,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,file=/home/kvm_autotest_root/images/rhel77-64-virtio-scsi.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,iothread=iothread0 \
    -device virtio-net-pci,mac=9a:01:02:03:04:05,id=idk78PtS,vectors=4,netdev=idd7g5ik,bus=pci.0,addr=0x5  \
    -netdev tap,id=idd7g5ik,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
    -cpu 'Westmere',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \

  4.Do mirror from src to dst.
     { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target":"/home/mirror.qcow2","sync":"full","mode":"existing"}}

  5.Set migration capabilities.
     {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}

  6.Migrate from src to dst.
     { "execute": "migrate", "arguments": { "uri": "tcp:localhost:5000"}}
{"return": {}}

  7.Wait some minutes, check migrate status
     {"execute":"qury-migrate"}
     {"return": {"expected-downtime": 300, "status": "active", "setup-time": 328, "total-time": 20992, "ram": {"total": 7533830144, "postcopy-requests": 0, "dirty-sync-count": 1, "page-size": 4096, "remaining": 1263505408, "mbps": 268.5672, "transferred": 689773714, "duplicate": 1365763, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 676159488, "normal": 165078}}}
     {"timestamp": {"seconds": 1569389952, "microseconds": 920203}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "drive_image1"}}
{"timestamp": {"seconds": 1569389952, "microseconds": 920339}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "drive_image1"}}
{"timestamp": {"seconds": 1569389952, "microseconds": 920563}, "event": "STOP"}

 8. Continue migration
     {"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}

 9. Check src vm status
    (qemu) info status 
VM status: paused (postmigrate)

 10. Check dst vm status
     (qemu) info status 
VM status: running

Comment 44 aihua liang 2019-09-30 02:11:13 UTC
As comment41, set bug's status to "Verified".

Comment 46 errata-xmlrpc 2020-03-31 14:34:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1216


Note You need to log in before you can comment on or make changes to this bug.