Description of problem: After I cherry-picked these commits, my libvirtd can be started, but the connection can't be established, and libvirtd failed to stop after 'service libvirt-bin stop', as well as 'ctrl+c' when I run 'libvirtd -v' on the foreground: commits: 9a713d64ff0d7e7ca69486faeb45b2e2894f728b 2014-02-04 Daniel P. Berrange Add a read/write lock implementation 51c40f62d48276f2772891f071c44311037cc001 2014-02-04 Daniel P. Berrange Push nwfilter update locking up to top level 6b2e42fced83c2f2be51e4a08196a4029b9465d2 13 days ago Eric Blake event: move event filtering to daemon (regression fix) Version-Release number of selected component (if applicable): 1.1.4-manit How reproducible: Steps to Reproduce: 1.create many VMs(40 in my env) on the host 2.kill -9 `pid of libvirtd` 3.start libvirtd by using service libvirt-bin start Actual results: virsh version/list command is hung there without any responses Expected results: virsh version/list return the correct things Additional info: if there are few VMs(1 vm in my env), the virsh version command is OK
my libvirt version is 1.1.4-manit without the newest patches after this one: 13 days ago Eric Blake event: move event filtering to daemon (regression fix)
the logs while I running libvirtd -v with debug message on the foreground, I send `ctrl+c` to it, but it doesn't exit: 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollMakePollFDs:394 : Prepare n=87 w=117, f=99 e=1 d=0 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 10 timers 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1392794186760 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1392794185395 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1392794186007 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:353 : Schedule timeout then=1392794185395 now=1392794181762 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:362 : Timeout at 1392794185395 due in 3633 ms 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollRunOnce:630 : EVENT_POLL_RUN: nhandles=88 timeout=3633 Ctrl^C here: 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollRunOnce:633 : Poll got error event 4 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollRunOnce:630 : EVENT_POLL_RUN: nhandles=88 timeout=3633 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollRunOnce:641 : Poll got 1 event(s) 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchTimeouts:426 : Dispatch 10 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:471 : Dispatch 88 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=0 w=1 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=1 w=2 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:499 : EVENT_POLL_DISPATCH_HANDLE: watch=2 events=1 2014-02-19 07:16:24.546+0000: 22383: debug : virNetServerQuit:1163 : Quit requested 0x7fc13b87d9b0 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=2 w=3 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=3 w=4 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=4 w=5 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=5 w=6 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=6 w=7 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=7 w=33 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=8 w=34 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=9 w=35 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=10 w=36 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=11 w=37 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=12 w=38 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=13 w=39 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=14 w=40 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=15 w=41 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=16 w=42 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=17 w=43 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=18 w=44 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=19 w=45 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=20 w=46 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=21 w=47 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=22 w=48 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=23 w=49 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=24 w=50 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=25 w=51 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=26 w=52 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=27 w=53 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=28 w=54 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=29 w=55 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=30 w=56 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=31 w=57 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=32 w=58 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=33 w=59 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=34 w=60 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=35 w=61 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=36 w=62 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=37 w=63 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=38 w=64 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=39 w=65 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=40 w=66 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=41 w=67 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=42 w=68 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=43 w=69 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=44 w=70 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=45 w=71 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=46 w=72 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=47 w=73 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=48 w=75 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=49 w=76 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=50 w=77 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=51 w=78 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=52 w=79 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=53 w=80 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=54 w=81 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=55 w=82 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=56 w=83 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=57 w=84 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=58 w=85 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=59 w=86 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=60 w=87 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=61 w=88 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=62 w=89 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=63 w=90 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=64 w=91 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=65 w=92 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=66 w=93 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=67 w=94 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=68 w=95 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=69 w=96 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=70 w=97 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=71 w=98 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=72 w=99 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=73 w=100 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=74 w=101 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=75 w=102 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=76 w=103 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=77 w=104 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=78 w=105 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=79 w=106 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=80 w=107 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=81 w=108 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=82 w=109 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=83 w=110 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=84 w=111 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=85 w=112 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=86 w=116 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=87 w=117 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollCleanupTimeouts:517 : Cleanup 10 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollCleanupHandles:565 : Cleanup 88 2014-02-19 07:16:24.547+0000: 22383: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/daemon 2014-02-19 07:16:24.547+0000: 22383: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/qemu 2014-02-19 07:16:24.547+0000: 22383: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/lxc 2014-02-19 07:16:24.547+0000: 22383: info : virNetlinkEventServiceStopAll:421 : stopping all netlink event services 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=5 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:193 : mark delete 4 12 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=6 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:193 : mark delete 5 13 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b887980 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b8879b0 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b887c20 2014-02-19 07:16:24.548+0000: 22383: debug : virFileClose:90 : Closed fd 10 2014-02-19 07:16:24.548+0000: 22383: debug : virFileClose:90 : Closed fd 11 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b87d9b0 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:258 : OBJECT_DISPOSE: obj=0x7fc13b87d9b0 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=3 events=0 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=4 events=0 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 ^C^C^C^C^C^C^C^C^C^C^C
Please use the current upstream if you want some particular patches included. Cherry-picking some patches may break it as they may not work on them own. Generally this is a upstream bug tracker and we don't support custom downstream versions here.
I reopen this bug, because I can reproduce it at libvirt-1.1.3.4 release branch without any other patches. I got the tarball by using `wget http://libvirt.org/sources/stable_updates/libvirt-1.1.3.4.tar.gz`, then I configure and make/ make install it, this bug also can be reproduced after I following the reproduce steps in comment #1.
This bug also exists in 1.2.2 and master branch.
I can reproduce it at v1.2.1-maint but cann't reproduce it at v1.2.1,so I believe this bug is imported by the maintains patches after v1.2.1 release.
(In reply to Wangpan from comment #6) > I can reproduce it at v1.2.1-maint but cann't reproduce it at v1.2.1,so I > believe this bug is imported by the maintains patches after v1.2.1 release. Can you run 'git bisect' to determine exactly which commit introduces the problem?
hzwangpan@10-120-120-22:~/libvirt$ git bisect good c5d10b7a3d685712e17166a9b181e4d02dd7e6c9 is the first bad commit commit c5d10b7a3d685712e17166a9b181e4d02dd7e6c9 Author: Daniel P. Berrange <berrange> Date: Wed Jan 22 17:28:29 2014 +0000 Push nwfilter update locking up to top level The NWFilter code has as a deadlock race condition between the virNWFilter{Define,Undefine} APIs and starting of guest VMs due to mis-matched lock ordering. In the virNWFilter{Define,Undefine} codepaths the lock ordering is 1. nwfilter driver lock 2. virt driver lock 3. nwfilter update lock 4. domain object lock In the VM guest startup paths the lock ordering is 1. virt driver lock 2. domain object lock 3. nwfilter update lock As can be seen the domain object and nwfilter update locks are not acquired in a consistent order. The fix used is to push the nwfilter update lock upto the top level resulting in a lock ordering for virNWFilter{Define,Undefine} of 1. nwfilter driver lock 2. nwfilter update lock 3. virt driver lock 4. domain object lock and VM start using 1. nwfilter update lock 2. virt driver lock 3. domain object lock This has the effect of serializing VM startup once again, even if no nwfilters are applied to the guest. There is also the possibility of deadlock due to a call graph loop via virNWFilterInstantiate and virNWFilterInstantiateFilterLate. These two problems mean the lock must be turned into a read/write lock instead of a plain mutex at the same time. The lock is used to serialize changes to the "driver->nwfilters" hash, so the write lock only needs to be held by the define/undefine methods. All other methods can rely on a read lock which allows good concurrency. Signed-off-by: Daniel P. Berrange <berrange> (cherry picked from commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb) :040000 040000 db2b2e3e98781b3e17deb2b0e129a63eb5f44f3f 70472daff126b86aab41544df7488d09001cc10b M src
If you are getting a hang, then please provide a stack trace of libvirtd using 'thread apply all bt' and attach to this bug
Created attachment 871847 [details] The full stack trace by (thread apply all bt) Please see the attachment file 'Threads_bt.txt'
Ah ha, so we have one thread doing autostart of VMs and a bunch of other threads which are reconnecting to existing running VMs. These are deadlocking with each other for some reason. This looks like a genuine bug.
Proposed fixed upstream https://www.redhat.com/archives/libvir-list/2014-March/msg00501.html
I have tested this patch on master branch, and it works OK now, thanks Daniel!
Fixed by: commit 925de19ed7f13e0d12d0b993496d314bab886589 Author: Daniel P. Berrange <berrange> CommitDate: 2014-03-10 11:56:45 +0000 Add a mutex to serialize updates to firewall git describe: v1.2.2-58-g925de19 contains: v1.2.3-rc1~329