Bug 859712 - [libvirt] Deadlock in libvirt after storage is blocked
[libvirt] Deadlock in libvirt after storage is blocked
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
x86_64 Unspecified
unspecified Severity urgent
: rc
: ---
Assigned To: Eric Blake
Virtualization Bugs
: 869222 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2012-09-23 09:03 EDT by Gadi Ickowicz
Modified: 2014-08-21 21:41 EDT (History)
14 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-3.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-02-21 02:24:12 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
libvirt + vm logs, gdb trace (527.24 KB, application/x-gzip)
2012-09-23 09:03 EDT, Gadi Ickowicz
no flags Details

  None (edit)
Description Gadi Ickowicz 2012-09-23 09:03:50 EDT
Created attachment 616104 [details]
libvirt + vm logs, gdb trace

Description of problem:
A deadlock occurred in libvirt after the connection to the (single) NFS storage domain was blocked. A VM was running at the time with a disk on the storage domain.

logs attached:
 * gdb trace
 * libvirt log
 * vm log

Version-Release number of selected component (if applicable):

How reproducible:
Comment 2 Eric Blake 2012-09-26 16:14:08 EDT
From the gdb log, I see:

Thread 1 (Thread 0x7f9563ae6860 (LWP 2190)):
#0  0x00000032c780e054 in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000032c7809388 in _L_lock_854 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00000032c7809257 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x000000000048b0ec in qemuProcessHandleAgentDestroy (agent=0x7f95500bdcd0, vm=0x7f955000c9c0) at qemu/qemu_process.c:169
        priv = <value optimized out>
#4  0x0000000000465943 in qemuAgentFree (mon=0x7f95500bdcd0) at qemu/qemu_agent.c:148
No locals.
#5  qemuAgentUnref (mon=0x7f95500bdcd0) at qemu/qemu_agent.c:167
No locals.

And from upstream, I see Dan's patches to fix a deadlock in this area of code:

I'm not sure if Dan's fix will cover this issue, but it is certainly a possibility.
Comment 6 Dave Allan 2012-10-08 12:18:08 EDT
Eric, when there's a build available with Dan's fixes, can you post the link here for Gadi to test?  Thanks.
Comment 8 Eric Blake 2012-10-10 23:42:49 EDT
Dan already backported his deadlock fixes for lxc under bug 864336, but included the qemu deadlock bugs in the process.  I'm working on a scratch build.
Comment 10 Eric Blake 2012-10-11 00:05:13 EDT
Moving to POST as a result of this thread:
Comment 15 Haim 2012-10-23 06:36:34 EDT
*** Bug 869222 has been marked as a duplicate of this bug. ***
Comment 21 zhe peng 2012-12-14 03:30:15 EST
As no one can reproduce this now, i will do code inspection and run some sanity test.

check source code:
Fix misc deadlocks in LXC and QEMU all 6 patchs already in build.
Patch59: libvirt-Fix-potential-deadlock-when-agent-is-closed.patch
Patch60: libvirt-Fix-rare-deadlock-in-QEMU-monitor-callbacks.patch
Patch61: libvirt-Convert-virLXCMonitor-to-use-virObject.patch
Patch62: libvirt-Remove-pointless-virLXCProcessMonitorDestroy-method.patch
Patch63: libvirt-Simplify-some-redundant-locking-while-unref-ing-objects.patch
Patch64: libvirt-Fix-deadlock-in-handling-EOF-in-LXC-monitor.patch

and from https://www.redhat.com/archives/libvir-list/2012-September/msg01962.html
deadlocks not occur.

do sanity test:
by Eric comments:
creating and shutting off guests while also having another thread (such as virt-manager) querying domain status , I did 300 cycles of that and no deadlocks occur.
move to verified.
Comment 22 errata-xmlrpc 2013-02-21 02:24:12 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.