Bug 1569861

Summary: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMemoryStats)
Product: Red Hat Enterprise Linux 7 Reporter: Tomas Kopecek <tkopecek>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Han Han <hhan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.7CC: bugzilla49, dyuan, jss, juzhou, libvirt-maint, lmen, pkrempa, pmarciniak, redhat, syzop, tkopecek, xuzhang, yafu
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-4.4.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1530346 Environment:
Last Closed: 2018-10-30 09:55:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1530346    
Bug Blocks:    
Attachments:
Description Flags
traceback none

Description Tomas Kopecek 2018-04-20 07:13:22 UTC
Created attachment 1424409 [details]
traceback

+++ This bug was initially created as a clone of Bug #1530346 +++

Description of problem:

After requesting an operation on a VM, virt-manager hangs, and has to be killed.  When re-launched, virt-manager cannot connect to qemu, and libvirt status shows timeout when trying to acquire state change lock.

After recent updates to Fedora 27, I'm now hitting this bug on a pretty regular basis. Sometimes all that's required to trigger it, is asking a machine to shutdown via virt-manager. Other times, it seems to be triggered by doing several operations in quick succession.

Version-Release number of selected component (if applicable):
libvirt-3.7.0-3.fc27.x86_64
qemu-2.10.1-2.fc27.x86_64

How reproducible:
Intermittent, but pretty easy to trigger *accidentally* after recent upgrade to F27.

Steps to Reproduce:
1. Ask machine to shutdown using virt-manager (sometimes this is sufficient)
2. Ask another machine to start or shutdown.
3. It will probably be hard to trigger deliberately, but seems to be happening quite frequently now, when I'm *not* trying to trigger it...

Actual results:
virt-manager hangs, and has to be TERMed or KILLed, when re-launched, it cannot connect to Qemu/KVM, virsh list reports machine stuck "in shutdown" although qemu process has exited (in some fashion), libvirt service reports errors:

Jan 03 02:40:55 Il-Duce libvirtd[1240]: 2018-01-02 16:10:55.366+0000: 1350: warning : qemuGetProcessInfo:1434 : cannot parse process status data
Jan 03 02:41:25 Il-Duce libvirtd[1240]: 2018-01-02 16:11:25.370+0000: 1349: warning : qemuDomainObjBeginJobInternal:4115 : Cannot start job (modify, none) for domain jss_Voltaire_c74; current job is (query, none) owned by (1352 remoteDispatchDomainMemoryStats, 0 <null>) for (277s, 0s)
Jan 03 02:41:25 Il-Duce libvirtd[1240]: 2018-01-02 16:11:25.370+0000: 1349: error : qemuDomainObjBeginJobInternal:4127 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMemoryStats)
Jan 03 02:41:50 Il-Duce libvirtd[1240]: 2018-01-02 16:11:50.028+0000: 1349: warning : qemuGetProcessInfo:1434 : cannot parse process status data
Jan 03 02:41:55 Il-Duce libvirtd[1240]: 2018-01-02 16:11:55.371+0000: 1353: warning : qemuDomainObjBeginJobInternal:4115 : Cannot start job (query, none) for domain jss_Voltaire_c74; current job is (query, none) owned by (1352 remoteDispatchDomainMemoryStats, 0 <null>) for (307s, 0s)
Jan 03 02:41:55 Il-Duce libvirtd[1240]: 2018-01-02 16:11:55.371+0000: 1353: error : qemuDomainObjBeginJobInternal:4127 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMemoryStats)
Jan 03 02:42:13 Il-Duce libvirtd[1240]: 2018-01-02 16:12:13.729+0000: 1240: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
...
Jan 03 03:00:50 Il-Duce libvirtd[1240]: 2018-01-02 16:30:50.029+0000: 1349: warning : qemuGetProcessInfo:1434 : cannot parse process status data
Jan 03 03:01:50 Il-Duce libvirtd[1240]: 2018-01-02 16:31:50.028+0000: 1350: warning : qemuGetProcessInfo:1434 : cannot parse process status data
Jan 03 03:02:50 Il-Duce libvirtd[1240]: 2018-01-02 16:32:50.028+0000: 1351: warning : qemuGetProcessInfo:1434 : cannot parse process status data

Stopping libvirtd fails and systemd is forced to kill the unit.

Restarting libvirtd tends to succeed, although service status shows this:
Jan 03 03:19:20 Il-Duce libvirtd[21222]: 2018-01-02 16:49:20.500+0000: 21473: error : qemuMonitorOpenUnix:376 : failed to connect to monitor socket: No such process

VM that was shutdown can be restarted.

I don't think it is always so easy to recover, I'm pretty sure i had to reboot to bring libvirt back, a week or so ago.

Expected results:
libvirt & virt-manager behave properly. 

Additional info:

--- Additional comment from Peter Krempa on 2018-01-04 03:38:55 EST ---

Could you please post the stack trace of the libvirtd process when that happens? (e.g. using gstack, or gdb).

--- Additional comment from Carlos Guidugli on 2018-03-13 23:23:19 EDT ---

+1. Having this problem occasionally too. Trying just to power off the machine do not work. Need to restart libvirtd.


 Id    Name                           State
----------------------------------------------------
 1     PURGATORY                      running
 3     BATMAN                         in shutdown


$ sudo virsh destroy BATMAN
error: Failed to destroy domain BATMAN
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMemoryStats)

--- Additional comment from John on 2018-03-19 08:33:49 EDT ---

Sorry for the delay, I have been relocating to Sydney and starting new job. I don't have time for this.

But it has happened again, as soon as I have to use libvirt:

Here is the stack trace:

[root@Il-Duce 03-19 22:56:29 ~]# ps -ef | grep libvirtd
root        1163       1  0 21:02 ?        00:00:32 /usr/sbin/libvirtd
root       46035   31093  0 22:56 pts/0    00:00:00 grep --color=auto libvirtd
[root@Il-Duce 03-19 22:56:35 ~]# gstack 1163
Thread 17 (Thread 0x7f02bcc44700 (LWP 31346)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 16 (Thread 0x7f02bd445700 (LWP 1634)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 15 (Thread 0x7f02bdc46700 (LWP 1633)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 14 (Thread 0x7f02be447700 (LWP 1632)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 13 (Thread 0x7f02bec48700 (LWP 1631)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 12 (Thread 0x7f02bf449700 (LWP 1630)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 11 (Thread 0x7f02f3fff700 (LWP 1308)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec934e8 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 10 (Thread 0x7f02f8916700 (LWP 1307)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec934e8 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f02f9117700 (LWP 1306)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec934e8 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f02f9918700 (LWP 1305)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec934e8 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f02fa119700 (LWP 1304)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec934e8 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f02fa91a700 (LWP 1303)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f02fb11b700 (LWP 1302)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f02c080665c in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#3  0x00007f02c081a219 in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007f02c081c560 in qemuMonitorJSONGetBalloonInfo () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007f02c081c6d5 in qemuMonitorJSONGetMemoryStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6  0x00007f02c082d5d3 in qemuDomainMemoryStatsInternal () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7  0x00007f02c083ac59 in qemuDomainMemoryStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007f030ed4e668 in virDomainMemoryStats () from /lib64/libvirt.so.0
#9  0x000056067677cbbc in remoteDispatchDomainMemoryStatsHelper ()
#10 0x00007f030edb80bc in virNetServerProgramDispatch () from /lib64/libvirt.so.0
#11 0x000056067679ac58 in virNetServerHandleJob ()
#12 0x00007f030ec934a1 in virThreadPoolWorker () from /lib64/libvirt.so.0
#13 0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#14 0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#15 0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f02f311b700 (LWP 1301)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f02fb91c700 (LWP 1300)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f02fc11d700 (LWP 1299)):
#0  0x00007f030ae6ecbb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f030ec92a76 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f030ec935b3 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f030ec92818 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f030ae6861b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f030ab95c2f in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f030f991980 (LWP 1163)):
#0  0x00007f030ab8967b in poll () from /lib64/libc.so.6
#1  0x00007f030ec3c5d1 in virEventPollRunOnce () from /lib64/libvirt.so.0
#2  0x00007f030ec3b151 in virEventRunDefaultImpl () from /lib64/libvirt.so.0
#3  0x00007f030edb23c5 in virNetDaemonRun () from /lib64/libvirt.so.0
#4  0x0000560676761058 in main ()
[root@Il-Duce 03-19 22:56:41 ~]#


[root@Il-Duce 03-19 21:50:27 ~]# systemctl status libvirtd
? libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-03-19 21:01:44 ACDT; 1h 53min ago
     Docs: man:libvirtd(8)
           http://libvirt.org
 Main PID: 1163 (libvirtd)
    Tasks: 24 (limit: 32768)
   Memory: 69.6M
      CPU: 34.716s
   CGroup: /system.slice/libvirtd.service
           ??1163 /usr/sbin/libvirtd
           ??2505 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/soe.vorpal.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ??2506 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/soe.vorpal.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ??2676 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/dns.vorpal.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ??2868 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ??2869 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ??3034 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/ovirt.vorpal.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
           ??3035 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/ovirt.vorpal.conf --leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper

Mar 19 22:47:38 Il-Duce libvirtd[1163]: 2018-03-19 12:17:38.155+0000: 1299: error : qemuDomainObjBeginJobInternal:4127 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMemoryStats)
Mar 19 22:47:38 Il-Duce libvirtd[1163]: 2018-03-19 12:17:38.161+0000: 1163: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Mar 19 22:47:38 Il-Duce libvirtd[1163]: 2018-03-19 12:17:38.395+0000: 1299: warning : qemuGetProcessInfo:1434 : cannot parse process status data

And yes, it is the same error that Carlos is seeing. Extremely frustrating. As soon as I have to use libvirt again, it hits me.

--- Additional comment from John on 2018-03-19 08:35:22 EDT ---

This time, all I did was use virt-manager to tell a VM to shutdown. And... game over.

--- Additional comment from John on 2018-03-19 08:40:17 EDT ---

I updated my system a week ago, so I am now seeing this with:

Linux Il-Duce 4.15.6-300.fc27.x86_64 #1 SMP Mon Feb 26 18:43:03 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

libvirt-3.7.0-4.fc27.x86_64
qemu-2.10.1-2.fc27.x86_64

--- Additional comment from Michael Barker on 2018-04-02 23:17:03 EDT ---

I am seeing the same error on a Ubuntu 17.10 host as well, so it's definitely not only to do with RedHat/Fedora builds.

Linux S1 4.13.0-37-generic #42-Ubuntu SMP Wed Mar 7 14:13:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ virsh --version
3.6.0
$ libvirtd --version
libvirtd (libvirt) 3.6.0
$ /usr/bin/qemu-system-x86_64 --version
QEMU emulator version 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.5)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

All installed from repository.

I should also note that I only encounter this issue with VMs that have PCIe devices passed through (one VM with a GPU and USB controller, another with an Ethernet controller and a SAS controller). Other VMs with no PCIe passthrough have never encountered this issue. It also occurs with VMs of any OS (I have encountered with FreeBSD 11, Ubuntu 17.04 and 17.10, Windows 7 and 10).

I've found that the best way to reproduce this error is to either force a PCIe device to crash within the guest (causing the guest OS to kernel panic/BSOD), or to force-shutdown the guest from virt-manager while PCIe devices are under significant load (I have yet to quantify "significant load"). I have been unable to reproduce by changing states from command line via the virsh command.

--- Additional comment from Bram Matthys on 2018-04-13 08:45:51 EDT ---

Same here, VM crashes within minutes when it starts running windows update (w10 64 bit). Then virt-manager freezes up, and often virsh becomes unresponsive as well, even 'virsh destroy' no longer works.
This is:
* Ubuntu 17.10
* Linux 4.13.0-38-generic #43-Ubuntu SMP Wed Mar 14 15:20:44 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
* QEMU emulator version 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.5)

Although I think the previous comment from Michael Barker is likely more helpful than this "me too!" post.

--- Additional comment from John on 2018-04-13 09:55:01 EDT ---

I don't think I've got pcie passthrough enabled on any of my vms, unless that is the default now, with the latest kernel/libvirt/qemu.

--- Additional comment from John on 2018-04-13 10:00:48 EDT ---

The stack trace I provided shows the problem pretty clearly, but if there is any further info i can collect to narrow it down further, let me know. I might have some time this weekend to do some further investigation.

BTW, even if i don't have PCIe passthrough, I am using host CPU passthrough on many or most of my VMs, as I'm doing some nested virtualisation. Perhaps that is a factor.

Comment 2 Tomas Kopecek 2018-04-20 07:18:08 UTC
We've hit (probably) same bug in our buildsystem. I'm providing some info in an attachment. It seems to be happening at least on ppc and s390x archs.

RHEL 7
libvirt-3.2.0-14.el7_4.3
kernel-3.10.0-693.5.2.el7

Comment 4 Han Han 2018-05-09 03:50:10 UTC
Bug not reproduced on:
  libvirt-3.9.0-14.el7_5.4.x86_64
  virt-manager-1.4.3-3.el7.noarch
  qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64
Do shutdown/start on multi VMs in parallel, no deadlock.
It may be related to this bug: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1688508 . It has been fixed according to "qemu: Fix shutting down domains in parallel -- If multiple domains were being shut down in parallel, libvirtd might have deadlocked. " in libvirt-v4.1 release note.

Comment 5 John 2018-05-09 09:32:50 UTC
(In reply to Han Han from comment #4)
> Bug not reproduced on:
>   libvirt-3.9.0-14.el7_5.4.x86_64
>   virt-manager-1.4.3-3.el7.noarch
>   qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64
> Do shutdown/start on multi VMs in parallel, no deadlock.
> It may be related to this bug:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1688508 . It has been
> fixed according to "qemu: Fix shutting down domains in parallel -- If
> multiple domains were being shut down in parallel, libvirtd might have
> deadlocked. " in libvirt-v4.1 release note.

It's not related to that.

Comment 6 yafu 2018-05-10 07:09:28 UTC
Also can not reproduce with libvirt-3.2.0-14.el7_4.9.x86_64.

Comment 7 Michal Privoznik 2018-05-18 14:39:40 UTC
Tomas, can you please test with libvirt-3.9.0-9.el7 or newer? My analysis shows that this bug might be a dup of 1536461. I'd like to close it as such.

Comment 8 Tomas Kopecek 2018-05-29 12:23:42 UTC
We've deployed libvirt-daemon-3.9.0-14.el7_5.2 and it seems to be fixed. Thanks!

Comment 9 Michal Privoznik 2018-05-29 14:10:41 UTC
Very well, I'm going to put this into POST then so that we can test it.

Comment 11 Han Han 2018-07-10 08:39:02 UTC
Verified on libvirt-4.5.0-2.virtcov.el7.x86_64 qemu-kvm-rhev-2.12.0-7.el7.x86_64:

1. Parepare a running VM with qemu-guest-agent active

(in vm)# systemctl status qemu-guest-agent
● qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; disabled; >
   Active: active (running) since Tue 2018-07-10 16:30:30 CST; 4min 47s ago
 Main PID: 723 (qemu-ga)
    Tasks: 1 (limit: 1149)
   Memory: 1.8M
   CGroup: /system.slice/qemu-guest-agent.service
           └─723 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-por>
 
2. Create serval snapshots with --disk-only --quiesce 
# virsh snapshot-create-as usb S1 --disk-only --quiesce 
Domain snapshot S1 created

# virsh snapshot-create-as usb S2 --disk-only --quiesce 
Domain snapshot S2 created

# virsh snapshot-create-as usb S3 --disk-only --quiesce 
Domain snapshot S3 created

Check VM is running well. Verified.

Comment 12 Han Han 2018-07-10 08:51:49 UTC
I mistook the bug verify of https://bugzilla.redhat.com/show_bug.cgi?id=1598084 as comment11. Please ignore it.

Comment 13 Han Han 2018-08-22 08:20:10 UTC
On libvirt-4.5.0-6.el7.x86_64 qemu-kvm-rhev-2.12.0-10.el7.x86_64, verified as https://bugzilla.redhat.com/show_bug.cgi?id=1536461#c4. passed.

Comment 15 errata-xmlrpc 2018-10-30 09:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113