Bug 856950
| Summary: | Deadlock on libvirt when playing with hotplug and add/remove vm | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | GenadiC <gcheresh> | ||||||
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | GenadiC <gcheresh> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 6.3 | CC: | acathrow, ajia, berrange, cpelland, dallan, dyasny, dyuan, mavital, mprivozn, mzhan, ohochman, rwu, weizhan, ydu, ykaul | ||||||
| Target Milestone: | rc | Keywords: | ZStream | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | libvirt-0.10.2-0rc1.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
When a qemu process is being destroyed by libvirt, a clean-up operation frees some internal structures and locks. However, since users can destroy qemu processes at the same time, libvirt holds the qemu driver lock to protect the list of domains and their states, among other things. Previously, a function tried to set up the qemu driver lock when it was already up, creating a deadlock. The code has been modified to always check if the lock is free before attempting to set it up, thus fixing this bug.
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 875710 (view as bug list) | Environment: | |||||||
| Last Closed: | 2013-02-21 07:23:43 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 875710, 875788, 876102 | ||||||||
| Attachments: |
|
||||||||
Created attachment 612373 [details]
libvirt log
This is the problematic thread:
Thread 1 (Thread 0x7f215745a860 (LWP 2594)):
#0 0x0000003e3500e054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003e35009388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x0000003e35009257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000048b0ec in qemuProcessHandleAgentDestroy (agent=0x7f213000f5d0,
vm=0x7f213000c360) at qemu/qemu_process.c:169
#4 0x0000000000465943 in qemuAgentFree (mon=0x7f213000f5d0)
at qemu/qemu_agent.c:148
#5 qemuAgentUnref (mon=0x7f213000f5d0) at qemu/qemu_agent.c:167
#6 0x00000000004659f6 in qemuAgentClose (mon=0x7f213000f5d0)
at qemu/qemu_agent.c:828
#7 0x000000000048cf8e in qemuProcessHandleAgentEOF (agent=0x7f213000f5d0,
vm=0x7f213000c360) at qemu/qemu_process.c:130
#8 0x0000000000465bf3 in qemuAgentIO (watch=<value optimized out>,
fd=<value optimized out>, events=<value optimized out>,
opaque=0x7f213000f5d0) at qemu/qemu_agent.c:715
#9 0x0000003a564486df in virEventPollDispatchHandles ()
at util/event_poll.c:490
#10 virEventPollRunOnce () at util/event_poll.c:637
#11 0x0000003a56447487 in virEventRunDefaultImpl () at util/event.c:247
#12 0x0000003a56515aed in virNetServerRun (srv=0x11fce20)
at rpc/virnetserver.c:736
#13 0x0000000000422421 in main (argc=<value optimized out>,
argv=<value optimized out>) at libvirtd.c:1615
There is a recursively callback invocation here
1. On EOF from the agent, the qemuProcessHandleAgentEOF() callback is run which locks virDomainObjPtr.
2. It then frees the agent which triggers qemuProcessHandleAgentDestroy() which tries to lock virDomainObjPtr again. Hence deadlock.
This could be solved by re-arranging code in HandleAgentEOF() like this
priv = vm->privateData;
priv->agent = NULL;
virDomainObjUnlock(vm);
qemuDriverUnlock(driver);
qemuAgentClose(agent);
ie only hold the lock, while you blank out the 'priv->agent' field.
(In reply to comment #3) > This could be solved by re-arranging code in HandleAgentEOF() like this Dan, are you planning to submit a patch? Not right now. Would be better if someone who can actually reproduce the problem can test the idea I suggest above & then submit the patch if it is confirmed to work. Can you test a scratch build when we have one? (In reply to comment #6) > Can you test a scratch build when we have one? Yes, I can try, although I don't have a specific scenario that can reproduce the problem So I've spin a scratch build. Can you please give it a try? http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4867654 (In reply to comment #8) > So I've spin a scratch build. Can you please give it a try? > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4867654 I installed the attached build and was not managed to reproduce the problem What a great news. Patch proposed upstream then: https://www.redhat.com/archives/libvir-list/2012-September/msg01165.html Patch pushed upstream, moving to POST:
commit 1020a5041b0eb575f65b53cb1ca9cee2447a50cd
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri Sep 14 10:53:00 2012 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Tue Sep 18 09:24:06 2012 +0200
qemu: Avoid deadlock on HandleAgentEOF
On agent EOF the qemuProcessHandleAgentEOF() callback is called
which locks virDomainObjPtr. Then qemuAgentClose() is called
(with domain object locked) which eventually calls qemuAgentDispose()
and qemuProcessHandleAgentDestroy(). This tries to lock the
domain object again. Hence the deadlock.
v0.10.1-190-g1020a50
I can reproduce the bug on libvirt-0.10.1-2.el6.x86_64: 1. open the first terminal and run the following cmdline(add your domain name and image name) # for i in `seq 100`;do virsh attach-disk <domain> /var/lib/libvirt/images/<image>.img vda; virsh detach-disk myRHEL6 vda; sleep 2;done 2. open the second terminal and the following cmdline(add your domain name and image name) Need to prepare guest xml file from bar-1.xml to bar-10.xml and guest name is bar-1..bar-10. # for i in `seq 10`;do virsh create bar-$i.xml;done # for i in `seq 10`;do virsh destroy bar-$i;done And then you will see the following error: error: Failed to reconnect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer error: Failed to reconnect to the hypervisor error: no valid connection error: internal error client socket is closed It's okay on libvirt-0.10.2-0rc1.el6, I haven't find this error, so move the bug to VERIFIED status. *** Bug 875710 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html |
Created attachment 612372 [details] gdb log Description of problem: After playing with hotplug/hotunplug and add/remove VM I got deadlock on libvirt Log attached