Description of problem: Libvirt deadlock when restart libvirtd. Version-Release number of selected component (if applicable): libvirtd version 0.9.10, qemu-kvm version 1.0. How reproducible: I run a shell which has a loop to do virsh domstate. There are about 50 domains. When I restart libvirtd, it often deadlock. Steps to Reproduce: 1. Run 50 domains; 2. Execute the sell do the loop tasks querying virsh domstate; 3. Restart the libvirtd service Actual results: Deadlock appeared after the libvirtd restarted. Expected results: No deadlock. Additional info: Thread 1 stack: #0 0x00007ff13603d89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007ff136039065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ff136038eba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007ff136efc702 in virMutexLock (m=0x7ff1280e04a0, func=0x5278b0 "qemuDriverLock", line=61) at util/threads-pthread.c:87 #4 0x00000000004aa6a6 in qemuDriverLock (driver=0x7ff1280e04a0, func=0x51ea7f "qemudClose", line=921) at qemu/qemu_conf.c:61 #5 0x0000000000458e0a in qemudClose (conn=0x7ff04c0230f0) at qemu/qemu_driver.c:921 #6 0x00007ff136f7227f in virReleaseConnect (conn=0x7ff04c0230f0) at datatypes.c:114 #7 0x00007ff136f7240d in virUnrefConnect (conn=0x7ff04c0230f0) at datatypes.c:149 #8 0x00007ff136f7bcbf in virConnectClose (conn=0x7ff04c0230f0) at libvirt.c:1471 #9 0x000000000043fdc2 in remoteClientFreeFunc (data=0x1a4b4a0) at remote.c:547 #10 0x00007ff136fd6635 in virNetServerClientFree (client=0x1877df0) at rpc/virnetserverclient.c:601 #11 0x00007ff136fd5691 in virNetServerClientEventFree (opaque=0x1877df0) at rpc/virnetserverclient.c:175 #12 0x00007ff136fe00c7 in virNetSocketEventFree (opaque=0x1a46e50) at rpc/virnetsocket.c:1329 #13 0x00007ff136ee4884 in virEventPollCleanupHandles () at util/event_poll.c:572 #14 0x00007ff136ee4a5a in virEventPollRunOnce () at util/event_poll.c:608 #15 0x00007ff136ee2def in virEventRunDefaultImpl () at util/event.c:247 #16 0x00007ff136fd4d17 in virNetServerRun (srv=0x186dcf0) at rpc/virnetserver.c:736 #17 0x0000000000420763 in main (argc=2, argv=0x7fffa1284e18) at libvirtd.c:1602 Thread 2 stack: #0 0x00007f7ed7c20d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f7ed8ae2885 in virCondWait (c=0x7f7eb00022b0, m=0x7f7eb0002250) at util/threads-pthread.c:121 #2 0x00000000004c2bbf in qemuMonitorSend (mon=0x7f7eb0002250, msg=0x7f7e6e7fba70) at qemu/qemu_monitor.c:794 #3 0x00000000004d36b2 in qemuMonitorJSONCommandWithFd (mon=0x7f7eb0002250, cmd=0x7f7eb00036d0, scm_fd=-1, reply=0x7f7e6e7fbb50) at qemu/qemu_monitor_json.c:230 #4 0x00000000004d37e2 in qemuMonitorJSONCommand (mon=0x7f7eb0002250, cmd=0x7f7eb00036d0, reply=0x7f7e6e7fbb50) at qemu/qemu_monitor_json.c:259 #5 0x00000000004d69be in qemuMonitorJSONGetBlockInfo (mon=0x7f7eb0002250, table=0x7f7eb00036f0) at qemu/qemu_monitor_json.c:1373 #6 0x00000000004c45ce in qemuMonitorGetBlockInfo (mon=0x7f7eb0002250) at qemu/qemu_monitor.c:1256 #7 0x00000000004a29de in qemuDomainCheckEjectableMedia (driver=0x7f7ec80102b0, vm=0x7f7ec802ab30) at qemu/qemu_hotplug.c:164 #8 0x00000000004b44d3 in qemuProcessReconnect (opaque=0x7f7ec815fc60) at qemu/qemu_process.c:2932 #9 0x00007f7ed8ae2a5f in virThreadHelper (data=0x7f7ec8103a10) at util/threads-pthread.c:165 #10 0x00007f7ed7c1ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #11 0x00007f7ed794a4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 I review the code: Thread 1: virEventPollRunOnce { virEventPollCleanupHandles ... poll() ... } Thread 2: qemuMonitorSend { qemuMonitorUpdateWatch ... while (!mon->msg->finished) { if (virCondWait(&mon->notify, &mon->lock) < 0) { ... } ... } } Thread 2 has called qemuDriverLock, send a cmd to qemu, then wait thread 1 to poll. Thread 1 call virEventPollCleanupHandles, finally call qemudClose, which call qemuDriverLock. Thread 2 wait thread 1 to poll, Thread 1 wait thread 2 to call qemuDriverUnlock. So deadlock happened. I would glad to receive any answers, thanks.
Is this still reproducible with the current git head?
Thank you. I have a test with the 0.10.2 R2, the issue can not be reproduced. And the code is corrected. int qemuDomainCheckEjectableMedia(struct qemud_driver *driver, virDomainObjPtr vm, enum qemuDomainAsyncJob asyncJob) { ... .... if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { table = qemuMonitorGetBlockInfo(priv->mon); qemuDomainObjExitMonitorWithDriver(driver, vm); } ... ... } Using qemuDomainObjEnterMonitorAsync replacing qemuDomainObjEnterMonitor to unlock the driver lock, so avoid the dead lock with virEventPollCleanupHandles. That may be the true for it. Thanks a lot.
(In reply to comment #2) > I have a test with the 0.10.2 R2, the issue can not be reproduced. > > And the code is corrected. Ok, I will close the BZ; thanks for reporting it. Dave