Bug 794776 - libvirt deadlock after destroy & undefine domain
Summary: libvirt deadlock after destroy & undefine domain
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Michal Privoznik
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-17 15:42 UTC by Marc-Andre Lureau
Modified: 2012-05-25 07:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-25 07:35:30 UTC


Attachments (Terms of Use)

Description Marc-Andre Lureau 2012-02-17 15:42:24 UTC
Description of problem:

libvirt deadlock'ed after I destroyed and undefined a domain, in qemu:///session.

Version-Release number of selected component (if applicable):

0.9.9

Not sure if the bug is reproducible.

Steps:
1. (optional) create a "live" f16 domain with the Boxes wizard
2. virsh -c qemu:///session destroy "Fedora 16"
3. virsh -c qemu:///session undefine "Fedora 16"
  
Actual results:

(gdb) thread apply all bt

Thread 11 (Thread 0x7f5257e1a700 (LWP 27214)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5257e1a700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 10 (Thread 0x7f5257619700 (LWP 27215)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5257619700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 9 (Thread 0x7f5256e18700 (LWP 27216)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5256e18700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 8 (Thread 0x7f5256617700 (LWP 27217)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5256617700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 7 (Thread 0x7f5255e16700 (LWP 27218)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5255e16700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 6 (Thread 0x7f5255615700 (LWP 27219)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103
---Type <return> to continue, or q <return> to quit--- 
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5255615700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 5 (Thread 0x7f5254e14700 (LWP 27220)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5254e14700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 4 (Thread 0x7f5254613700 (LWP 27221)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5254613700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 3 (Thread 0x7f5253e12700 (LWP 27222)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5c20) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5253e12700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 2 (Thread 0x7f5253611700 (LWP 27223)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
#1  0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117
#2  0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103
#3  0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161
#4  0x0000003a17a07d90 in start_thread (arg=0x7f5253611700) at pthread_create.c:309
#5  0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Thread 1 (Thread 0x7f525e688840 (LWP 27212)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x0000003a17a09f97 in _L_lock_863 () from /lib64/libpthread.so.0
#2  0x0000003a17a09deb in __pthread_mutex_lock (mutex=0x7f524c08ab48) at pthread_mutex_lock.c:65
#3  0x00007f525f510277 in virMutexLock (m=0x7f524c08ab48) at util/threads-pthread.c:85
#4  0x00007f525f55afea in virDomainEventStateLock (state=0x7f524c08ab30) at conf/domain_event.c:591
#5  0x00007f525f55cebf in virDomainEventStateDeregisterConn (conn=0x7f5238053ba0, state=0x7f524c08ab30)
    at conf/domain_event.c:1510
#6  0x000000000045d13d in qemudClose (conn=0x7f5238053ba0) at qemu/qemu_driver.c:914
---Type <return> to continue, or q <return> to quit---
#7  0x00007f525f59ae53 in virReleaseConnect (conn=0x7f5238053ba0) at datatypes.c:114
#8  0x00007f525f59afc9 in virUnrefConnect (conn=0x7f5238053ba0) at datatypes.c:149
#9  0x00007f525f55a696 in virDomainEventCallbackListPurgeMarked (cbList=0x7f524c08d1a0) at conf/domain_event.c:347
#10 0x00007f525f55c9fb in virDomainEventStateFlush (state=0x7f524c08ab30) at conf/domain_event.c:1307
#11 0x00007f525f55b105 in virDomainEventTimer (timer=65, opaque=0x7f524c08ab30) at conf/domain_event.c:630
#12 0x00007f525f4f7821 in virEventPollDispatchTimeouts () at util/event_poll.c:440
#13 0x00007f525f4f83be in virEventPollRunOnce () at util/event_poll.c:633
#14 0x00007f525f4f64eb in virEventRunDefaultImpl () at util/event.c:247
#15 0x00007f525f60e021 in virNetServerRun (srv=0x24cd790) at rpc/virnetserver.c:736
#16 0x0000000000424234 in main (argc=1, argv=0x7fff11c45848) at libvirtd.c:1602
(gdb)

Comment 1 Dave Allan 2012-02-17 16:48:58 UTC
Michal, does the backtrace shed any light on what's going on here?

Comment 2 Daniel Berrangé 2012-02-17 16:54:12 UTC
THe problem is the virDomainEventStateFlush() method + free callbacks.

When dispatching callbacks it is careful to release the driver lock.

When purging deleted callbacks though, we don't release the lock. So if  the 'free callback' again uses the virDomainEventState it will deadlock.

Comment 3 Dave Allan 2012-02-17 19:54:42 UTC
(In reply to comment #2)
> THe problem is the virDomainEventStateFlush() method + free callbacks.
> 
> When dispatching callbacks it is careful to release the driver lock.
> 
> When purging deleted callbacks though, we don't release the lock. So if  the
> 'free callback' again uses the virDomainEventState it will deadlock.

Will you submit a patch?

Comment 4 Michal Privoznik 2012-05-21 11:54:51 UTC
The patch has beed proposed upstream:

https://www.redhat.com/archives/libvir-list/2012-May/msg00995.html

Comment 5 Michal Privoznik 2012-05-23 08:17:13 UTC
Moving to POST:

commit 2cb0899eec72376629a0583647dcad39b00c5715
Author:     Daniel P. Berrange <berrange>
AuthorDate: Mon May 21 12:10:53 2012 +0100
Commit:     Daniel P. Berrange <berrange>
CommitDate: Mon May 21 18:50:47 2012 +0100

    Fix potential events deadlock when unref'ing virConnectPtr
    
    When the last reference to a virConnectPtr is released by
    libvirtd, it was possible for a deadlock to occur in the
    virDomainEventState functions. The virDomainEventStatePtr
    holds a reference on virConnectPtr for each registered
    callback. When removing a callback, the virUnrefConnect
    function is run. If this causes the last reference on the
    virConnectPtr to be released, then virReleaseConnect can
    be run, which in turns calls qemudClose. This function has
    a call to virDomainEventStateDeregisterConn which is intended
    to remove all callbacks associated with the virConnectPtr
    instance. This will try to grab a lock on virDomainEventState
    but this lock is already held. Deadlock ensues
    
    Thread 1 (Thread 0x7fcbb526a840 (LWP 23185)):
    
    Since each callback associated with a virConnectPtr holds a
    reference on virConnectPtr, it is impossible for the qemudClose
    method to be invoked while any callbacks are still registered.
    Thus the call to virDomainEventStateDeregisterConn must in fact
    be a no-op. Thus it is possible to just remove all trace of
    virDomainEventStateDeregisterConn and avoid the deadlock.
    
    * src/conf/domain_event.c, src/conf/domain_event.h,
      src/libvirt_private.syms: Delete virDomainEventStateDeregisterConn
    * src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
      src/qemu/qemu_driver.c, src/uml/uml_driver.c: Remove
      calls to virDomainEventStateDeregisterConn

Comment 6 Michal Privoznik 2012-05-25 07:35:30 UTC
Oh, since this is reported against upstream, the BZ hygene is not moving to POST but CLOSED NEXTRELEASE. The referred patch will be picked up by next release (0.9.13)


Note You need to log in before you can comment on or make changes to this bug.