Bug 859712

Summary: [libvirt] Deadlock in libvirt after storage is blocked
Product: Red Hat Enterprise Linux 6 Reporter: Gadi Ickowicz <gickowic>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.3CC: acathrow, berrange, dallan, dbotzer, dyasny, dyuan, eblake, hateya, mzhan, nlevinki, rvaknin, rwu, whuang, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.10.2-3.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 07:24:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirt + vm logs, gdb trace none

Description Gadi Ickowicz 2012-09-23 13:03:50 UTC
Created attachment 616104 [details]
libvirt + vm logs, gdb trace

Description of problem:
A deadlock occurred in libvirt after the connection to the (single) NFS storage domain was blocked. A VM was running at the time with a disk on the storage domain.

logs attached:
 * gdb trace
 * libvirt log
 * vm log

Version-Release number of selected component (if applicable):
libvirt-0.9.10-21.el6_3.4.x86_64

How reproducible:
?

Comment 2 Eric Blake 2012-09-26 20:14:08 UTC
From the gdb log, I see:

Thread 1 (Thread 0x7f9563ae6860 (LWP 2190)):
#0  0x00000032c780e054 in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000032c7809388 in _L_lock_854 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00000032c7809257 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x000000000048b0ec in qemuProcessHandleAgentDestroy (agent=0x7f95500bdcd0, vm=0x7f955000c9c0) at qemu/qemu_process.c:169
        priv = <value optimized out>
#4  0x0000000000465943 in qemuAgentFree (mon=0x7f95500bdcd0) at qemu/qemu_agent.c:148
No locals.
#5  qemuAgentUnref (mon=0x7f95500bdcd0) at qemu/qemu_agent.c:167
No locals.

And from upstream, I see Dan's patches to fix a deadlock in this area of code:
https://www.redhat.com/archives/libvir-list/2012-September/msg01806.html
https://www.redhat.com/archives/libvir-list/2012-September/msg01812.html

I'm not sure if Dan's fix will cover this issue, but it is certainly a possibility.

Comment 6 Dave Allan 2012-10-08 16:18:08 UTC
Eric, when there's a build available with Dan's fixes, can you post the link here for Gadi to test?  Thanks.

Comment 8 Eric Blake 2012-10-11 03:42:49 UTC
Dan already backported his deadlock fixes for lxc under bug 864336, but included the qemu deadlock bugs in the process.  I'm working on a scratch build.

Comment 10 Eric Blake 2012-10-11 04:05:13 UTC
Moving to POST as a result of this thread:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-October/msg00483.html

Comment 15 Haim 2012-10-23 10:36:34 UTC
*** Bug 869222 has been marked as a duplicate of this bug. ***

Comment 21 zhe peng 2012-12-14 08:30:15 UTC
As no one can reproduce this now, i will do code inspection and run some sanity test.
build:
libvirt-0.10.2-12.el6.x86_64

check source code:
Fix misc deadlocks in LXC and QEMU all 6 patchs already in build.
Patch59: libvirt-Fix-potential-deadlock-when-agent-is-closed.patch
Patch60: libvirt-Fix-rare-deadlock-in-QEMU-monitor-callbacks.patch
Patch61: libvirt-Convert-virLXCMonitor-to-use-virObject.patch
Patch62: libvirt-Remove-pointless-virLXCProcessMonitorDestroy-method.patch
Patch63: libvirt-Simplify-some-redundant-locking-while-unref-ing-objects.patch
Patch64: libvirt-Fix-deadlock-in-handling-EOF-in-LXC-monitor.patch


and from https://www.redhat.com/archives/libvir-list/2012-September/msg01962.html
deadlocks not occur.

do sanity test:
 
by Eric comments:
creating and shutting off guests while also having another thread (such as virt-manager) querying domain status , I did 300 cycles of that and no deadlocks occur.
move to verified.

Comment 22 errata-xmlrpc 2013-02-21 07:24:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html