Red Hat Bugzilla – Bug 856950
Deadlock on libvirt when playing with hotplug and add/remove vm
Last modified: 2013-02-21 02:23:43 EST
Created attachment 612372 [details] gdb log Description of problem: After playing with hotplug/hotunplug and add/remove VM I got deadlock on libvirt Log attached
Created attachment 612373 [details] libvirt log
This is the problematic thread: Thread 1 (Thread 0x7f215745a860 (LWP 2594)): #0 0x0000003e3500e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x0000003e35009388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x0000003e35009257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000048b0ec in qemuProcessHandleAgentDestroy (agent=0x7f213000f5d0, vm=0x7f213000c360) at qemu/qemu_process.c:169 #4 0x0000000000465943 in qemuAgentFree (mon=0x7f213000f5d0) at qemu/qemu_agent.c:148 #5 qemuAgentUnref (mon=0x7f213000f5d0) at qemu/qemu_agent.c:167 #6 0x00000000004659f6 in qemuAgentClose (mon=0x7f213000f5d0) at qemu/qemu_agent.c:828 #7 0x000000000048cf8e in qemuProcessHandleAgentEOF (agent=0x7f213000f5d0, vm=0x7f213000c360) at qemu/qemu_process.c:130 #8 0x0000000000465bf3 in qemuAgentIO (watch=<value optimized out>, fd=<value optimized out>, events=<value optimized out>, opaque=0x7f213000f5d0) at qemu/qemu_agent.c:715 #9 0x0000003a564486df in virEventPollDispatchHandles () at util/event_poll.c:490 #10 virEventPollRunOnce () at util/event_poll.c:637 #11 0x0000003a56447487 in virEventRunDefaultImpl () at util/event.c:247 #12 0x0000003a56515aed in virNetServerRun (srv=0x11fce20) at rpc/virnetserver.c:736 #13 0x0000000000422421 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1615 There is a recursively callback invocation here 1. On EOF from the agent, the qemuProcessHandleAgentEOF() callback is run which locks virDomainObjPtr. 2. It then frees the agent which triggers qemuProcessHandleAgentDestroy() which tries to lock virDomainObjPtr again. Hence deadlock. This could be solved by re-arranging code in HandleAgentEOF() like this priv = vm->privateData; priv->agent = NULL; virDomainObjUnlock(vm); qemuDriverUnlock(driver); qemuAgentClose(agent); ie only hold the lock, while you blank out the 'priv->agent' field.
(In reply to comment #3) > This could be solved by re-arranging code in HandleAgentEOF() like this Dan, are you planning to submit a patch?
Not right now. Would be better if someone who can actually reproduce the problem can test the idea I suggest above & then submit the patch if it is confirmed to work.
Can you test a scratch build when we have one?
(In reply to comment #6) > Can you test a scratch build when we have one? Yes, I can try, although I don't have a specific scenario that can reproduce the problem
So I've spin a scratch build. Can you please give it a try? http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4867654
(In reply to comment #8) > So I've spin a scratch build. Can you please give it a try? > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4867654 I installed the attached build and was not managed to reproduce the problem
What a great news. Patch proposed upstream then: https://www.redhat.com/archives/libvir-list/2012-September/msg01165.html
Patch pushed upstream, moving to POST: commit 1020a5041b0eb575f65b53cb1ca9cee2447a50cd Author: Michal Privoznik <mprivozn@redhat.com> AuthorDate: Fri Sep 14 10:53:00 2012 +0200 Commit: Michal Privoznik <mprivozn@redhat.com> CommitDate: Tue Sep 18 09:24:06 2012 +0200 qemu: Avoid deadlock on HandleAgentEOF On agent EOF the qemuProcessHandleAgentEOF() callback is called which locks virDomainObjPtr. Then qemuAgentClose() is called (with domain object locked) which eventually calls qemuAgentDispose() and qemuProcessHandleAgentDestroy(). This tries to lock the domain object again. Hence the deadlock. v0.10.1-190-g1020a50
I can reproduce the bug on libvirt-0.10.1-2.el6.x86_64: 1. open the first terminal and run the following cmdline(add your domain name and image name) # for i in `seq 100`;do virsh attach-disk <domain> /var/lib/libvirt/images/<image>.img vda; virsh detach-disk myRHEL6 vda; sleep 2;done 2. open the second terminal and the following cmdline(add your domain name and image name) Need to prepare guest xml file from bar-1.xml to bar-10.xml and guest name is bar-1..bar-10. # for i in `seq 10`;do virsh create bar-$i.xml;done # for i in `seq 10`;do virsh destroy bar-$i;done And then you will see the following error: error: Failed to reconnect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer error: Failed to reconnect to the hypervisor error: no valid connection error: internal error client socket is closed It's okay on libvirt-0.10.2-0rc1.el6, I haven't find this error, so move the bug to VERIFIED status.
*** Bug 875710 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html