Bug 858611 - Libvirt deadlock when restart libvirtd.
Summary: Libvirt deadlock when restart libvirtd.
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-19 08:30 UTC by guozhonghua
Modified: 2012-09-25 13:48 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-25 13:48:40 UTC
Embargoed:


Attachments (Terms of Use)

Description guozhonghua 2012-09-19 08:30:23 UTC
Description of problem:

Libvirt deadlock when restart libvirtd.

Version-Release number of selected component (if applicable):

libvirtd version 0.9.10, qemu-kvm version 1.0.

How reproducible:
I run a shell which has a loop to do virsh domstate. There are about 50 domains.
When I restart libvirtd, it often deadlock.

Steps to Reproduce:
1. Run 50 domains;
2. Execute the sell do the loop tasks querying virsh domstate;
3. Restart the libvirtd service
  
Actual results:

Deadlock appeared after the libvirtd restarted.

Expected results:
No deadlock. 

Additional info:

Thread 1 stack:

#0  0x00007ff13603d89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007ff136039065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ff136038eba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007ff136efc702 in virMutexLock (m=0x7ff1280e04a0, func=0x5278b0 "qemuDriverLock", line=61) at util/threads-pthread.c:87
#4  0x00000000004aa6a6 in qemuDriverLock (driver=0x7ff1280e04a0, func=0x51ea7f "qemudClose", line=921) at qemu/qemu_conf.c:61
#5  0x0000000000458e0a in qemudClose (conn=0x7ff04c0230f0) at qemu/qemu_driver.c:921
#6  0x00007ff136f7227f in virReleaseConnect (conn=0x7ff04c0230f0) at datatypes.c:114
#7  0x00007ff136f7240d in virUnrefConnect (conn=0x7ff04c0230f0) at datatypes.c:149
#8  0x00007ff136f7bcbf in virConnectClose (conn=0x7ff04c0230f0) at libvirt.c:1471
#9  0x000000000043fdc2 in remoteClientFreeFunc (data=0x1a4b4a0) at remote.c:547
#10 0x00007ff136fd6635 in virNetServerClientFree (client=0x1877df0) at rpc/virnetserverclient.c:601
#11 0x00007ff136fd5691 in virNetServerClientEventFree (opaque=0x1877df0) at rpc/virnetserverclient.c:175
#12 0x00007ff136fe00c7 in virNetSocketEventFree (opaque=0x1a46e50) at rpc/virnetsocket.c:1329
#13 0x00007ff136ee4884 in virEventPollCleanupHandles () at util/event_poll.c:572
#14 0x00007ff136ee4a5a in virEventPollRunOnce () at util/event_poll.c:608
#15 0x00007ff136ee2def in virEventRunDefaultImpl () at util/event.c:247
#16 0x00007ff136fd4d17 in virNetServerRun (srv=0x186dcf0) at rpc/virnetserver.c:736
#17 0x0000000000420763 in main (argc=2, argv=0x7fffa1284e18) at libvirtd.c:1602 

Thread 2 stack:

#0  0x00007f7ed7c20d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f7ed8ae2885 in virCondWait (c=0x7f7eb00022b0, m=0x7f7eb0002250) at util/threads-pthread.c:121
#2  0x00000000004c2bbf in qemuMonitorSend (mon=0x7f7eb0002250, msg=0x7f7e6e7fba70) at qemu/qemu_monitor.c:794
#3  0x00000000004d36b2 in qemuMonitorJSONCommandWithFd (mon=0x7f7eb0002250, cmd=0x7f7eb00036d0, scm_fd=-1, reply=0x7f7e6e7fbb50) at qemu/qemu_monitor_json.c:230
#4  0x00000000004d37e2 in qemuMonitorJSONCommand (mon=0x7f7eb0002250, cmd=0x7f7eb00036d0, reply=0x7f7e6e7fbb50) at qemu/qemu_monitor_json.c:259
#5  0x00000000004d69be in qemuMonitorJSONGetBlockInfo (mon=0x7f7eb0002250, table=0x7f7eb00036f0) at qemu/qemu_monitor_json.c:1373
#6  0x00000000004c45ce in qemuMonitorGetBlockInfo (mon=0x7f7eb0002250) at qemu/qemu_monitor.c:1256
#7  0x00000000004a29de in qemuDomainCheckEjectableMedia (driver=0x7f7ec80102b0, vm=0x7f7ec802ab30) at qemu/qemu_hotplug.c:164
#8  0x00000000004b44d3 in qemuProcessReconnect (opaque=0x7f7ec815fc60) at qemu/qemu_process.c:2932
#9  0x00007f7ed8ae2a5f in virThreadHelper (data=0x7f7ec8103a10) at util/threads-pthread.c:165
#10 0x00007f7ed7c1ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f7ed794a4bd in clone () from /lib/x86_64-linux-gnu/libc.so.6

I review the code:

Thread 1:
virEventPollRunOnce            
{                              
    virEventPollCleanupHandles 
    ...                        
    poll()                     
    ...                        
}


Thread 2:                                                          
qemuMonitorSend                                           
{                                                         
    qemuMonitorUpdateWatch                                
    ...                                                   
    while (!mon->msg->finished) {                         
        if (virCondWait(&mon->notify, &mon->lock) < 0) {  
        ...
        }
      ... 
    }                                                    
}

Thread 2 has called qemuDriverLock, send a cmd to qemu, then wait thread 1 to poll.
Thread 1 call virEventPollCleanupHandles, finally call qemudClose, which call qemuDriverLock.
Thread 2 wait thread 1 to poll, Thread 1 wait thread 2 to call qemuDriverUnlock.
So deadlock happened.

I would glad to receive any answers, thanks.

Comment 1 Dave Allan 2012-09-21 13:52:24 UTC
Is this still reproducible with the current git head?

Comment 2 guozhonghua 2012-09-25 08:36:22 UTC
Thank you.

I have a test with the 0.10.2 R2, the issue can not be reproduced.

And the code is corrected. 

int qemuDomainCheckEjectableMedia(struct qemud_driver *driver,
                             virDomainObjPtr vm,
                             enum qemuDomainAsyncJob asyncJob)
{
    ... ....
    if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) {
        table = qemuMonitorGetBlockInfo(priv->mon);
        qemuDomainObjExitMonitorWithDriver(driver, vm);
    }
    ... ... 
}

Using qemuDomainObjEnterMonitorAsync replacing qemuDomainObjEnterMonitor to unlock the driver lock, so avoid the dead lock with virEventPollCleanupHandles. 

That may be the true for it.

Thanks a lot.

Comment 3 Dave Allan 2012-09-25 13:48:40 UTC
(In reply to comment #2)
> I have a test with the 0.10.2 R2, the issue can not be reproduced.
> 
> And the code is corrected. 

Ok, I will close the BZ; thanks for reporting it.

Dave


Note You need to log in before you can comment on or make changes to this bug.