Bug 1066801
| Summary: | libvirtd 1.1.3.4 can't be connected | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] Virtualization Tools | Reporter: | Wangpan <hzwangpan> | ||||
| Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | unspecified | CC: | acathrow, berrange, eblake, hzwangpan, jtomko, pkrempa | ||||
| Target Milestone: | --- | Keywords: | Reopened | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-04-10 07:02:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Wangpan
2014-02-19 07:02:13 UTC
my libvirt version is 1.1.4-manit without the newest patches after this one: 13 days ago Eric Blake event: move event filtering to daemon (regression fix) the logs while I running libvirtd -v with debug message on the foreground, I send `ctrl+c` to it, but it doesn't exit: 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollMakePollFDs:394 : Prepare n=87 w=117, f=99 e=1 d=0 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 10 timers 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1392794186760 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1392794185395 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1392794186007 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:353 : Schedule timeout then=1392794185395 now=1392794181762 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollCalculateTimeout:362 : Timeout at 1392794185395 due in 3633 ms 2014-02-19 07:16:21.762+0000: 22383: debug : virEventPollRunOnce:630 : EVENT_POLL_RUN: nhandles=88 timeout=3633 Ctrl^C here: 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollRunOnce:633 : Poll got error event 4 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollRunOnce:630 : EVENT_POLL_RUN: nhandles=88 timeout=3633 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollRunOnce:641 : Poll got 1 event(s) 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchTimeouts:426 : Dispatch 10 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:471 : Dispatch 88 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=0 w=1 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=1 w=2 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:499 : EVENT_POLL_DISPATCH_HANDLE: watch=2 events=1 2014-02-19 07:16:24.546+0000: 22383: debug : virNetServerQuit:1163 : Quit requested 0x7fc13b87d9b0 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=2 w=3 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=3 w=4 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=4 w=5 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=5 w=6 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=6 w=7 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=7 w=33 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=8 w=34 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=9 w=35 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=10 w=36 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=11 w=37 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=12 w=38 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=13 w=39 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=14 w=40 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=15 w=41 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=16 w=42 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=17 w=43 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=18 w=44 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=19 w=45 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=20 w=46 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=21 w=47 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=22 w=48 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=23 w=49 2014-02-19 07:16:24.546+0000: 22383: debug : virEventPollDispatchHandles:485 : i=24 w=50 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=25 w=51 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=26 w=52 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=27 w=53 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=28 w=54 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=29 w=55 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=30 w=56 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=31 w=57 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=32 w=58 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=33 w=59 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=34 w=60 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=35 w=61 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=36 w=62 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=37 w=63 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=38 w=64 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=39 w=65 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=40 w=66 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=41 w=67 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=42 w=68 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=43 w=69 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=44 w=70 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=45 w=71 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=46 w=72 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=47 w=73 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=48 w=75 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=49 w=76 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=50 w=77 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=51 w=78 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=52 w=79 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=53 w=80 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=54 w=81 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=55 w=82 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=56 w=83 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=57 w=84 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=58 w=85 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=59 w=86 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=60 w=87 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=61 w=88 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=62 w=89 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=63 w=90 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=64 w=91 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=65 w=92 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=66 w=93 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=67 w=94 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=68 w=95 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=69 w=96 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=70 w=97 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=71 w=98 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=72 w=99 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=73 w=100 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=74 w=101 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=75 w=102 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=76 w=103 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=77 w=104 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=78 w=105 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=79 w=106 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=80 w=107 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=81 w=108 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=82 w=109 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=83 w=110 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=84 w=111 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=85 w=112 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=86 w=116 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollDispatchHandles:485 : i=87 w=117 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollCleanupTimeouts:517 : Cleanup 10 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollCleanupHandles:565 : Cleanup 88 2014-02-19 07:16:24.547+0000: 22383: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/daemon 2014-02-19 07:16:24.547+0000: 22383: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/qemu 2014-02-19 07:16:24.547+0000: 22383: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/lxc 2014-02-19 07:16:24.547+0000: 22383: info : virNetlinkEventServiceStopAll:421 : stopping all netlink event services 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=5 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:193 : mark delete 4 12 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=6 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollRemoveHandle:193 : mark delete 5 13 2014-02-19 07:16:24.547+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b887980 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b8879b0 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b887c20 2014-02-19 07:16:24.548+0000: 22383: debug : virFileClose:90 : Closed fd 10 2014-02-19 07:16:24.548+0000: 22383: debug : virFileClose:90 : Closed fd 11 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7fc13b87d9b0 2014-02-19 07:16:24.548+0000: 22383: debug : virObjectUnref:258 : OBJECT_DISPOSE: obj=0x7fc13b87d9b0 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=3 events=0 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=4 events=0 2014-02-19 07:16:24.548+0000: 22383: debug : virEventPollInterruptLocked:713 : Skip interrupt, 0 140467892094976 ^C^C^C^C^C^C^C^C^C^C^C Please use the current upstream if you want some particular patches included. Cherry-picking some patches may break it as they may not work on them own. Generally this is a upstream bug tracker and we don't support custom downstream versions here. I reopen this bug, because I can reproduce it at libvirt-1.1.3.4 release branch without any other patches. I got the tarball by using `wget http://libvirt.org/sources/stable_updates/libvirt-1.1.3.4.tar.gz`, then I configure and make/ make install it, this bug also can be reproduced after I following the reproduce steps in comment #1. This bug also exists in 1.2.2 and master branch. I can reproduce it at v1.2.1-maint but cann't reproduce it at v1.2.1,so I believe this bug is imported by the maintains patches after v1.2.1 release. (In reply to Wangpan from comment #6) > I can reproduce it at v1.2.1-maint but cann't reproduce it at v1.2.1,so I > believe this bug is imported by the maintains patches after v1.2.1 release. Can you run 'git bisect' to determine exactly which commit introduces the problem? hzwangpan@10-120-120-22:~/libvirt$ git bisect good
c5d10b7a3d685712e17166a9b181e4d02dd7e6c9 is the first bad commit
commit c5d10b7a3d685712e17166a9b181e4d02dd7e6c9
Author: Daniel P. Berrange <berrange>
Date: Wed Jan 22 17:28:29 2014 +0000
Push nwfilter update locking up to top level
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Signed-off-by: Daniel P. Berrange <berrange>
(cherry picked from commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb)
:040000 040000 db2b2e3e98781b3e17deb2b0e129a63eb5f44f3f 70472daff126b86aab41544df7488d09001cc10b M src
hzwangpan@10-120-120-22:~/libvirt$ git bisect good
c5d10b7a3d685712e17166a9b181e4d02dd7e6c9 is the first bad commit
commit c5d10b7a3d685712e17166a9b181e4d02dd7e6c9
Author: Daniel P. Berrange <berrange>
Date: Wed Jan 22 17:28:29 2014 +0000
Push nwfilter update locking up to top level
The NWFilter code has as a deadlock race condition between
the virNWFilter{Define,Undefine} APIs and starting of guest
VMs due to mis-matched lock ordering.
In the virNWFilter{Define,Undefine} codepaths the lock ordering
is
1. nwfilter driver lock
2. virt driver lock
3. nwfilter update lock
4. domain object lock
In the VM guest startup paths the lock ordering is
1. virt driver lock
2. domain object lock
3. nwfilter update lock
As can be seen the domain object and nwfilter update locks are
not acquired in a consistent order.
The fix used is to push the nwfilter update lock upto the top
level resulting in a lock ordering for virNWFilter{Define,Undefine}
of
1. nwfilter driver lock
2. nwfilter update lock
3. virt driver lock
4. domain object lock
and VM start using
1. nwfilter update lock
2. virt driver lock
3. domain object lock
This has the effect of serializing VM startup once again, even if
no nwfilters are applied to the guest. There is also the possibility
of deadlock due to a call graph loop via virNWFilterInstantiate
and virNWFilterInstantiateFilterLate.
These two problems mean the lock must be turned into a read/write
lock instead of a plain mutex at the same time. The lock is used to
serialize changes to the "driver->nwfilters" hash, so the write lock
only needs to be held by the define/undefine methods. All other
methods can rely on a read lock which allows good concurrency.
Signed-off-by: Daniel P. Berrange <berrange>
(cherry picked from commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb)
:040000 040000 db2b2e3e98781b3e17deb2b0e129a63eb5f44f3f 70472daff126b86aab41544df7488d09001cc10b M src
If you are getting a hang, then please provide a stack trace of libvirtd using 'thread apply all bt' and attach to this bug Created attachment 871847 [details]
The full stack trace by (thread apply all bt)
Please see the attachment file 'Threads_bt.txt'
Ah ha, so we have one thread doing autostart of VMs and a bunch of other threads which are reconnecting to existing running VMs. These are deadlocking with each other for some reason. This looks like a genuine bug. Proposed fixed upstream https://www.redhat.com/archives/libvir-list/2014-March/msg00501.html I have tested this patch on master branch, and it works OK now, thanks Daniel! Fixed by:
commit 925de19ed7f13e0d12d0b993496d314bab886589
Author: Daniel P. Berrange <berrange>
CommitDate: 2014-03-10 11:56:45 +0000
Add a mutex to serialize updates to firewall
git describe: v1.2.2-58-g925de19 contains: v1.2.3-rc1~329
|