RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 856950 - Deadlock on libvirt when playing with hotplug and add/remove vm
Summary: Deadlock on libvirt when playing with hotplug and add/remove vm
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: GenadiC
URL:
Whiteboard:
: 875710 (view as bug list)
Depends On:
Blocks: 875710 875788 876102
TreeView+ depends on / blocked
 
Reported: 2012-09-13 08:30 UTC by GenadiC
Modified: 2013-02-21 07:23 UTC (History)
15 users (show)

Fixed In Version: libvirt-0.10.2-0rc1.el6
Doc Type: Bug Fix
Doc Text:
When a qemu process is being destroyed by libvirt, a clean-up operation frees some internal structures and locks. However, since users can destroy qemu processes at the same time, libvirt holds the qemu driver lock to protect the list of domains and their states, among other things. Previously, a function tried to set up the qemu driver lock when it was already up, creating a deadlock. The code has been modified to always check if the lock is free before attempting to set it up, thus fixing this bug.
Clone Of:
: 875710 (view as bug list)
Environment:
Last Closed: 2013-02-21 07:23:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gdb log (13.46 KB, text/plain)
2012-09-13 08:30 UTC, GenadiC
no flags Details
libvirt log (663.59 KB, application/x-xz)
2012-09-13 08:33 UTC, GenadiC
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0276 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2013-02-20 21:18:26 UTC

Description GenadiC 2012-09-13 08:30:56 UTC
Created attachment 612372 [details]
gdb log

Description of problem:

After playing with hotplug/hotunplug and add/remove VM I got deadlock on libvirt
Log attached

Comment 1 GenadiC 2012-09-13 08:33:15 UTC
Created attachment 612373 [details]
libvirt log

Comment 3 Daniel Berrangé 2012-09-13 11:53:08 UTC
This is the problematic thread:

Thread 1 (Thread 0x7f215745a860 (LWP 2594)):
#0  0x0000003e3500e054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003e35009388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x0000003e35009257 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x000000000048b0ec in qemuProcessHandleAgentDestroy (agent=0x7f213000f5d0, 
    vm=0x7f213000c360) at qemu/qemu_process.c:169
#4  0x0000000000465943 in qemuAgentFree (mon=0x7f213000f5d0)
    at qemu/qemu_agent.c:148
#5  qemuAgentUnref (mon=0x7f213000f5d0) at qemu/qemu_agent.c:167
#6  0x00000000004659f6 in qemuAgentClose (mon=0x7f213000f5d0)
    at qemu/qemu_agent.c:828
#7  0x000000000048cf8e in qemuProcessHandleAgentEOF (agent=0x7f213000f5d0, 
    vm=0x7f213000c360) at qemu/qemu_process.c:130
#8  0x0000000000465bf3 in qemuAgentIO (watch=<value optimized out>, 
    fd=<value optimized out>, events=<value optimized out>, 
    opaque=0x7f213000f5d0) at qemu/qemu_agent.c:715
#9  0x0000003a564486df in virEventPollDispatchHandles ()
    at util/event_poll.c:490
#10 virEventPollRunOnce () at util/event_poll.c:637
#11 0x0000003a56447487 in virEventRunDefaultImpl () at util/event.c:247
#12 0x0000003a56515aed in virNetServerRun (srv=0x11fce20)
    at rpc/virnetserver.c:736
#13 0x0000000000422421 in main (argc=<value optimized out>, 
    argv=<value optimized out>) at libvirtd.c:1615

There is a recursively callback invocation here

 1. On EOF from the agent, the qemuProcessHandleAgentEOF() callback is run which locks virDomainObjPtr.
 2. It then frees the agent which triggers qemuProcessHandleAgentDestroy() which tries to lock virDomainObjPtr again. Hence deadlock.

This could be solved by re-arranging code in HandleAgentEOF() like this


    priv = vm->privateData;
    priv->agent = NULL;

    virDomainObjUnlock(vm);
    qemuDriverUnlock(driver);

    qemuAgentClose(agent);

ie only hold the lock, while you blank out the 'priv->agent' field.

Comment 4 Dave Allan 2012-09-13 14:58:29 UTC
(In reply to comment #3)
> This could be solved by re-arranging code in HandleAgentEOF() like this

Dan, are you planning to submit a patch?

Comment 5 Daniel Berrangé 2012-09-13 15:00:24 UTC
Not right now. Would be better if someone who can actually reproduce the problem can test the idea I suggest above & then submit the patch if it is confirmed to work.

Comment 6 Dave Allan 2012-09-13 15:15:06 UTC
Can you test a scratch build when we have one?

Comment 7 GenadiC 2012-09-13 15:27:23 UTC
(In reply to comment #6)
> Can you test a scratch build when we have one?

Yes, I can try, although I don't have a specific scenario that can reproduce the problem

Comment 8 Michal Privoznik 2012-09-14 11:47:17 UTC
So I've spin a scratch build. Can you please give it a try?

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4867654

Comment 9 GenadiC 2012-09-16 09:09:03 UTC
(In reply to comment #8)
> So I've spin a scratch build. Can you please give it a try?
> 
> http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4867654

I installed the attached build and was not managed to reproduce the problem

Comment 10 Michal Privoznik 2012-09-17 12:05:12 UTC
What a great news. Patch proposed upstream then:

https://www.redhat.com/archives/libvir-list/2012-September/msg01165.html

Comment 11 Michal Privoznik 2012-09-18 07:33:29 UTC
Patch pushed upstream, moving to POST:

commit 1020a5041b0eb575f65b53cb1ca9cee2447a50cd
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Sep 14 10:53:00 2012 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Tue Sep 18 09:24:06 2012 +0200

    qemu: Avoid deadlock on HandleAgentEOF
    
    On agent EOF the qemuProcessHandleAgentEOF() callback is called
    which locks virDomainObjPtr. Then qemuAgentClose() is called
    (with domain object locked) which eventually calls qemuAgentDispose()
    and qemuProcessHandleAgentDestroy(). This tries to lock the
    domain object again. Hence the deadlock.


v0.10.1-190-g1020a50

Comment 13 Alex Jia 2012-09-19 11:13:09 UTC
I can reproduce the bug on libvirt-0.10.1-2.el6.x86_64:

1. open the first terminal and run the following cmdline(add your domain name and image name)

# for i in `seq 100`;do virsh attach-disk <domain> /var/lib/libvirt/images/<image>.img vda; virsh detach-disk myRHEL6 vda; sleep 2;done

2. open the second terminal and the following cmdline(add your domain name and image name)

Need to prepare guest xml file from bar-1.xml to bar-10.xml and guest name is bar-1..bar-10.

# for i in `seq 10`;do virsh create bar-$i.xml;done
# for i in `seq 10`;do virsh destroy bar-$i;done

And then you will see the following error:

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

error: Failed to reconnect to the hypervisor
error: no valid connection
error: internal error client socket is closed


It's okay on libvirt-0.10.2-0rc1.el6, I haven't find this error, so move the bug to VERIFIED status.

Comment 14 Michal Privoznik 2012-11-12 13:20:20 UTC
*** Bug 875710 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2013-02-21 07:23:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html


Note You need to log in before you can comment on or make changes to this bug.