Hide Forgot
+++ This bug was initially created as a clone of Bug #929412 +++ We're seeing deadlocks under 1.0.3. I'll attach a traceback, but it looks like virNWFilterDomainFWUpdateCB is trying to take a lock on an object while holding updateMutex (and blocking), and virNWFilterInstantiateFilter is trying to take updateMutex. We didn't see this in 1.0.2. 37abd471656957c76eac687ce2ef94d79c8e2731 seems like a plausible candidate? --- Additional comment from Daniel Veillard on 2013-03-30 12:46:08 EDT --- Hum, I didn't see an obvious patch for such an issue in the git commits since v1.0.3, but if you have time giving a try to 1.0.4-rc2 it is available at ftp://libvirt.org/libvirt/ Thanks for the backtrace, I see a thread in qemuNetworkIfaceConnect too do you have a specific scenario to reproduce this ? That libvirtd is quite busy ! --- Additional comment from Matthew Garrett on 2013-03-30 14:07:29 EDT --- I'll see if I can get a full description of the reproduction case set up and give 1.0.4 a go - it'll be some time next week. --- Additional comment from Matthew Garrett on 2013-11-25 13:40:46 EST --- Still seeing this with 1.1.4, in exactly the same circumstances. This is while we're doing load testing, so there's a large number of instances being created and destroyed at around the same time. I don't have a trivial reproduction case. --- Additional comment from Dave Allan on 2013-11-25 13:56:14 EST --- Roughly how often are you seeing this and are you willing to install test builds to try to identify the source? --- Additional comment from Matthew Garrett on 2013-11-25 14:00:18 EST --- 2 or 3 days under heavy load is enough to trigger it. This is a test environment, so I can test patches. The cause seems to be that the virDomainCreateWithFlags()→_virNWFilterInstantiateFilter() path calls virObjectLock() and then virNWFilterLockFilterUpdates(), while the remoteDispatchNWFilterUndefine()→virNWFilterDomainFWUpdateCB() path calls virNWFilterLockFilterUpdates() and then virObjectLock(). --- Additional comment from Daniel Berrange on 2013-11-26 05:56:07 EST --- Confirmed from inspection that the lock ordering is fubar here. In addition to the nwfilterUndefine method, the nwfilterDefineXML will suffer the same flaw. The code naively assumed that making the nwfilter mutex recursive would avoid the issuing, ignoring the fact that the domain lock filter is not recursive. The code should have been written to avoid the recursively locking completely.
Created attachment 834797 [details] backtrace from deadlocked libvirtd
Hi Laine, Could you show the reproduce steps of this deadlock?
This bug was originally filed against upstream (Bug 929412) by Matthew Garrett. He may be able to offer more details about his test setup, but it looks like you should be able to trigger it by rapidly doing a lot of start/destroy of domains which have <filterref> elements, while simultaneously defining/undefining nwfilter rules - the deadlock occurs if a domain operation (start/destroy, for example) happens to be holding the nwfilter lock at the time a nwfilter operation (nwfilter-define/nwfilter-undefine) tries to scan all domains to update their filters after adding/removing a filter from the list.
(In reply to Laine Stump from comment #3) > This bug was originally filed against upstream (Bug 929412) by Matthew > Garrett. He may be able to offer more details about his test setup, but it > looks like you should be able to trigger it by rapidly doing a lot of > start/destroy of domains which have <filterref> elements, while > simultaneously defining/undefining nwfilter rules - the deadlock occurs if a > domain operation (start/destroy, for example) happens to be holding the > nwfilter lock at the time a nwfilter operation > (nwfilter-define/nwfilter-undefine) tries to scan all domains to update > their filters after adding/removing a filter from the list. Thanks Laine, I can reproduce it with your suggestion.
Upstream fixed in commits commit 6e5c79a1b5a8b3a23e7df7ffe58fb272aa17fbfb Author: Daniel P. Berrange <berrange> Date: Wed Jan 22 17:28:29 2014 +0000 Push nwfilter update locking up to top level commit c065984b58000a44c90588198d222a314ac532fd Author: Daniel P. Berrange <berrange> Date: Wed Jan 22 15:26:21 2014 +0000 Add a read/write lock implementation
backported patches have been tested on RHEL7 - deadlock easily occurred without the patches, and was eliminated with them. Posted to rhvirt-patches: http://post-office.corp.redhat.com/archives/rhvirt-patches/2014-February/msg00079.html
Verify the bug as follows. The result is expected. Move its status to VERIFIED. Versions libvirt-1.1.1-22.el7.x86_64 Steps 1. prepare a nwfilter xml and define it # cat disallow-arp.xml <filter name='disallow-arp' chain='arp'> <rule action='drop' direction='inout' priority='500'/> </filter> # virsh nwfilter-define disallow-arp.xml Network filter disallow-arp defined from disallow-arp.xml 2. modify guest xml # virsh dumpxml r6.4 <domain type='kvm'> ...... <interface type='network'> <mac address='52:54:00:e7:33:6c'/> <source network='default1'/> <model type='rtl8139'/> <filterref filter='disallow-arp'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> ...... </domain> 3. execute start and destroy guest operation # while true ; do virsh start r6.4 ; sleep 1 ; virsh destroy r6.4 ; done 4. In another terminal, execute define and undefine nwfilter operation # while true; do virsh nwfilter-undefine disallow-arp; sleep 1; virsh nwfilter-define disallow-arp.xml; done Result No deadlock occurs, libvirt works fine.
This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request.