Hide Forgot
Description of problem: libvirt deadlock'ed after I destroyed and undefined a domain, in qemu:///session. Version-Release number of selected component (if applicable): 0.9.9 Not sure if the bug is reproducible. Steps: 1. (optional) create a "live" f16 domain with the Boxes wizard 2. virsh -c qemu:///session destroy "Fedora 16" 3. virsh -c qemu:///session undefine "Fedora 16" Actual results: (gdb) thread apply all bt Thread 11 (Thread 0x7f5257e1a700 (LWP 27214)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5257e1a700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 10 (Thread 0x7f5257619700 (LWP 27215)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5257619700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 9 (Thread 0x7f5256e18700 (LWP 27216)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5256e18700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 8 (Thread 0x7f5256617700 (LWP 27217)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5256617700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 7 (Thread 0x7f5255e16700 (LWP 27218)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd8e0, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5255e16700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 6 (Thread 0x7f5255615700 (LWP 27219)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103 ---Type <return> to continue, or q <return> to quit--- #3 0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5255615700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 5 (Thread 0x7f5254e14700 (LWP 27220)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5970) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5254e14700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 4 (Thread 0x7f5254613700 (LWP 27221)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5254613700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 3 (Thread 0x7f5253e12700 (LWP 27222)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5c20) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5a90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5253e12700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 2 (Thread 0x7f5253611700 (LWP 27223)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00007f525f5103b9 in virCondWait (c=0x24cd970, m=0x24cd8b8) at util/threads-pthread.c:117 #2 0x00007f525f5109c6 in virThreadPoolWorker (opaque=0x24c5b00) at util/threadpool.c:103 #3 0x00007f525f510593 in virThreadHelper (data=0x24c5b90) at util/threads-pthread.c:161 #4 0x0000003a17a07d90 in start_thread (arg=0x7f5253611700) at pthread_create.c:309 #5 0x0000003a176ef48d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7f525e688840 (LWP 27212)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x0000003a17a09f97 in _L_lock_863 () from /lib64/libpthread.so.0 #2 0x0000003a17a09deb in __pthread_mutex_lock (mutex=0x7f524c08ab48) at pthread_mutex_lock.c:65 #3 0x00007f525f510277 in virMutexLock (m=0x7f524c08ab48) at util/threads-pthread.c:85 #4 0x00007f525f55afea in virDomainEventStateLock (state=0x7f524c08ab30) at conf/domain_event.c:591 #5 0x00007f525f55cebf in virDomainEventStateDeregisterConn (conn=0x7f5238053ba0, state=0x7f524c08ab30) at conf/domain_event.c:1510 #6 0x000000000045d13d in qemudClose (conn=0x7f5238053ba0) at qemu/qemu_driver.c:914 ---Type <return> to continue, or q <return> to quit--- #7 0x00007f525f59ae53 in virReleaseConnect (conn=0x7f5238053ba0) at datatypes.c:114 #8 0x00007f525f59afc9 in virUnrefConnect (conn=0x7f5238053ba0) at datatypes.c:149 #9 0x00007f525f55a696 in virDomainEventCallbackListPurgeMarked (cbList=0x7f524c08d1a0) at conf/domain_event.c:347 #10 0x00007f525f55c9fb in virDomainEventStateFlush (state=0x7f524c08ab30) at conf/domain_event.c:1307 #11 0x00007f525f55b105 in virDomainEventTimer (timer=65, opaque=0x7f524c08ab30) at conf/domain_event.c:630 #12 0x00007f525f4f7821 in virEventPollDispatchTimeouts () at util/event_poll.c:440 #13 0x00007f525f4f83be in virEventPollRunOnce () at util/event_poll.c:633 #14 0x00007f525f4f64eb in virEventRunDefaultImpl () at util/event.c:247 #15 0x00007f525f60e021 in virNetServerRun (srv=0x24cd790) at rpc/virnetserver.c:736 #16 0x0000000000424234 in main (argc=1, argv=0x7fff11c45848) at libvirtd.c:1602 (gdb)
Michal, does the backtrace shed any light on what's going on here?
THe problem is the virDomainEventStateFlush() method + free callbacks. When dispatching callbacks it is careful to release the driver lock. When purging deleted callbacks though, we don't release the lock. So if the 'free callback' again uses the virDomainEventState it will deadlock.
(In reply to comment #2) > THe problem is the virDomainEventStateFlush() method + free callbacks. > > When dispatching callbacks it is careful to release the driver lock. > > When purging deleted callbacks though, we don't release the lock. So if the > 'free callback' again uses the virDomainEventState it will deadlock. Will you submit a patch?
The patch has beed proposed upstream: https://www.redhat.com/archives/libvir-list/2012-May/msg00995.html
Moving to POST: commit 2cb0899eec72376629a0583647dcad39b00c5715 Author: Daniel P. Berrange <berrange> AuthorDate: Mon May 21 12:10:53 2012 +0100 Commit: Daniel P. Berrange <berrange> CommitDate: Mon May 21 18:50:47 2012 +0100 Fix potential events deadlock when unref'ing virConnectPtr When the last reference to a virConnectPtr is released by libvirtd, it was possible for a deadlock to occur in the virDomainEventState functions. The virDomainEventStatePtr holds a reference on virConnectPtr for each registered callback. When removing a callback, the virUnrefConnect function is run. If this causes the last reference on the virConnectPtr to be released, then virReleaseConnect can be run, which in turns calls qemudClose. This function has a call to virDomainEventStateDeregisterConn which is intended to remove all callbacks associated with the virConnectPtr instance. This will try to grab a lock on virDomainEventState but this lock is already held. Deadlock ensues Thread 1 (Thread 0x7fcbb526a840 (LWP 23185)): Since each callback associated with a virConnectPtr holds a reference on virConnectPtr, it is impossible for the qemudClose method to be invoked while any callbacks are still registered. Thus the call to virDomainEventStateDeregisterConn must in fact be a no-op. Thus it is possible to just remove all trace of virDomainEventStateDeregisterConn and avoid the deadlock. * src/conf/domain_event.c, src/conf/domain_event.h, src/libvirt_private.syms: Delete virDomainEventStateDeregisterConn * src/libxl/libxl_driver.c, src/lxc/lxc_driver.c, src/qemu/qemu_driver.c, src/uml/uml_driver.c: Remove calls to virDomainEventStateDeregisterConn
Oh, since this is reported against upstream, the BZ hygene is not moving to POST but CLOSED NEXTRELEASE. The referred patch will be picked up by next release (0.9.13)