Bug 2044379

Summary: virtnwfilterd modular daemon occasionally hangs on concurrent access
Product: Red Hat Enterprise Linux 9 Reporter: Erik Skultety <eskultet>
Component: libvirtAssignee: Daniel Berrangé <berrange>
libvirt sub component: General QA Contact: yafu <yafu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: berrange, jdenemar, jsuchane, kchamart, mprivozn, pvlasin, smitterl, virt-maint, xuzhang, yalzhang
Version: 9.1Keywords: AutomationTriaged, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.0.0-7.el9_0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2065399 (view as bug list) Environment:
Last Closed: 2022-05-17 12:46:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2065399    
Attachments:
Description Flags
Test data/scripts + libvirt debug logs none

Description Erik Skultety 2022-01-24 13:38:22 UTC
Created attachment 1853069 [details]
Test data/scripts + libvirt debug logs

Description of problem:

Modular daemons occasionally hang on concurrent access, more specifically, this problem occurred when accessing virtqemud and virtnwfilterd daemons concurrently.
At the same time, the issue never occurred with the "legacy" monolithic setup out of 50 test runs.

Version-Release number of selected component (if applicable):

libvirt-8.1.0 (tested), possibly even with <8.0.0


Reproducibility rate: <20%


Steps to Reproduce:
1. Unpack the 'reproducer' directory in the attachment tarball
2. $ cd reproducer
3. $ for i in `seq 1 50`; do
      echo "RUN: $i/50"
      echo '############'
      ./nwfilter_concurrent.sh
     done
4. At some point the execution will hang which will trigger the timeout in the script
5. '$ virsh list' hangs
   '$ virsh nwfilter-list' hangs
6. '$ systemctl restart virtqemud virtnwfilterd' takes time (likely triggering a timeout in systemd which then restarts the services forcefully)

Actual results:
The test execution hangs triggering a timeout in the test

Expected results:
All 50 test runs should pass

Additional info:
- debug logs are available in the attachment
- the test script in the attachment is a simplified version of what currently exists in the official libvirt TCK test suite
- FWIW for analysis the reproducibility rate on Fedora 35 is much higher

Comment 1 Jaroslav Suchanek 2022-01-27 15:51:32 UTC
Daniel, would you please look into this? Thanks.

Comment 2 Michal Privoznik 2022-02-01 10:38:11 UTC
I was able to reproduce and get stack trace (leaving out workers that do nothing):

Thread 9 (Thread 0x7f6f4dffb640 (LWP 6107) "prio-rpc-virtnw"):
#0  0x00007f6f58ee9639 in __futex_abstimed_wait_common64 () at /lib64/libpthread.so.0
#1  0x00007f6f58ee1d2f in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#2  0x00007f6f59ceb755 in virRWLockWrite (m=0x7f6f0801bca8) at ../src/util/virthread.c:124
#3  0x00007f6f59cbda06 in virObjectRWLockWrite (anyobj=0x7f6f0801bc90) at ../src/util/virobject.c:507
#4  0x00007f6f59de38c9 in virNWFilterBindingObjListRemove (bindings=0x7f6f0801bc90, binding=0x7f6f2c00b870) at ../src/conf/virnwfilterbindingobjlist.c:249
#5  0x00007f6f59b49067 in nwfilterBindingDelete (binding=0x7f6f34009d50) at ../src/nwfilter/nwfilter_driver.c:807
#6  0x00007f6f59ffd6cf in virNWFilterBindingDelete (binding=0x7f6f34009d50) at ../src/libvirt-nwfilter.c:794
#7  0x00005575c68114d8 in remoteDispatchNWFilterBindingDelete (server=0x5575c7e14880, client=0x5575c7e220e0, msg=0x5575c7e36e60, rerr=0x7f6f4dffa940, args=0x7f6f340199d0) at src/remote/remote_daemon_dispatch_stubs.h:16762
#8  0x00005575c67ebfa6 in remoteDispatchNWFilterBindingDeleteHelper (server=0x5575c7e14880, client=0x5575c7e220e0, msg=0x5575c7e36e60, rerr=0x7f6f4dffa940, args=0x7f6f340199d0, ret=0x0) at src/remote/remote_daemon_dispatch_stubs.h:16741
#9  0x00007f6f59e1c574 in virNetServerProgramDispatchCall (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e220e0, msg=0x5575c7e36e60) at ../src/rpc/virnetserverprogram.c:428
#10 0x00007f6f59e1c065 in virNetServerProgramDispatch (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e220e0, msg=0x5575c7e36e60) at ../src/rpc/virnetserverprogram.c:302
#11 0x00007f6f59e278b1 in virNetServerProcessMsg (srv=0x5575c7e14880, client=0x5575c7e220e0, prog=0x5575c7e16c10, msg=0x5575c7e36e60) at ../src/rpc/virnetserver.c:140
#12 0x00007f6f59e250c6 in virNetServerHandleJob (jobOpaque=0x5575c7e328b0, opaque=0x5575c7e14880) at ../src/rpc/virnetserver.c:160
#13 0x00007f6f59ced5a6 in virThreadPoolWorker (opaque=0x5575c7e0e380) at ../src/util/virthreadpool.c:164
#14 0x00007f6f59cebc00 in virThreadHelper (data=0x5575c7e0e3a0) at ../src/util/virthread.c:241
#15 0x00007f6f58edce1e in start_thread () at /lib64/libpthread.so.0
#16 0x00007f6f596c3a0f in clone () at /lib64/libc.so.6


Thread 6 (Thread 0x7f6f4f7fe640 (LWP 6104) "rpc-virtnwfilte"):
#0  0x00007f6f58ee62ab in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f6f58edf0b3 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x00007f6f59ceb675 in virMutexLock (m=0x7f6f08021c50) at ../src/util/virthread.c:91
#3  0x00007f6f59b49280 in nwfilterDriverLock () at ../src/nwfilter/nwfilter_driver.c:62
#4  0x00007f6f59b4872a in nwfilterDefineXMLFlags (conn=0x7f6f440038d0, xml=0x7f6f3c01a690 "<filter name='tck-vm2-filter' chain='root'>\n  <uuid>364d2617-43d0-7694-b479-320b72dac187</uuid>\n  <filterref filter='clean-traffic'/>\n  <rule action='accept' direction='in' priority='500'>\n    <all co"..., flags=0) at ../src/nwfilter/nwfilter_driver.c:548
#5  0x00007f6f59b485df in nwfilterDefineXML (conn=0x7f6f440038d0, xml=0x7f6f3c01a690 "<filter name='tck-vm2-filter' chain='root'>\n  <uuid>364d2617-43d0-7694-b479-320b72dac187</uuid>\n  <filterref filter='clean-traffic'/>\n  <rule action='accept' direction='in' priority='500'>\n    <all co"...) at ../src/nwfilter/nwfilter_driver.c:584
#6  0x00007f6f59ffc498 in virNWFilterDefineXML (conn=0x7f6f440038d0, xmlDesc=0x7f6f3c01a690 "<filter name='tck-vm2-filter' chain='root'>\n  <uuid>364d2617-43d0-7694-b479-320b72dac187</uuid>\n  <filterref filter='clean-traffic'/>\n  <rule action='accept' direction='in' priority='500'>\n    <all co"...) at ../src/libvirt-nwfilter.c:398
#7  0x00005575c67ff0fc in remoteDispatchNWFilterDefineXML (server=0x5575c7e14880, client=0x5575c7e221f0, msg=0x5575c7e24da0, rerr=0x7f6f4f7fd940, args=0x7f6f3c029410, ret=0x7f6f3c0229d0) at src/remote/remote_daemon_dispatch_stubs.h:16925
#8  0x00005575c67e24ea in remoteDispatchNWFilterDefineXMLHelper (server=0x5575c7e14880, client=0x5575c7e221f0, msg=0x5575c7e24da0, rerr=0x7f6f4f7fd940, args=0x7f6f3c029410, ret=0x7f6f3c0229d0) at src/remote/remote_daemon_dispatch_stubs.h:16906
#9  0x00007f6f59e1c574 in virNetServerProgramDispatchCall (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e221f0, msg=0x5575c7e24da0) at ../src/rpc/virnetserverprogram.c:428
#10 0x00007f6f59e1c065 in virNetServerProgramDispatch (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e221f0, msg=0x5575c7e24da0) at ../src/rpc/virnetserverprogram.c:302
#11 0x00007f6f59e278b1 in virNetServerProcessMsg (srv=0x5575c7e14880, client=0x5575c7e221f0, prog=0x5575c7e16c10, msg=0x5575c7e24da0) at ../src/rpc/virnetserver.c:140
#12 0x00007f6f59e250c6 in virNetServerHandleJob (jobOpaque=0x5575c7e332f0, opaque=0x5575c7e14880) at ../src/rpc/virnetserver.c:160
#13 0x00007f6f59ced5a6 in virThreadPoolWorker (opaque=0x5575c7e11db0) at ../src/util/virthreadpool.c:164
#14 0x00007f6f59cebc00 in virThreadHelper (data=0x5575c7e11dd0) at ../src/util/virthread.c:241
#15 0x00007f6f58edce1e in start_thread () at /lib64/libpthread.so.0
#16 0x00007f6f596c3a0f in clone () at /lib64/libc.so.6


Thread 3 (Thread 0x7f6f55674640 (LWP 6101) "rpc-virtnwfilte"):
#0  0x00007f6f58ee62ab in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f6f58edf130 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x00007f6f59ceb675 in virMutexLock (m=0x7f6f3c01bbe0) at ../src/util/virthread.c:91
#3  0x00007f6f59de4655 in virNWFilterObjLock (obj=0x7f6f3c01bbe0) at ../src/conf/virnwfilterobj.c:545
#4  0x00007f6f59de4766 in virNWFilterObjListFindByName (nwfilters=0x7f6f0801fe20, name=0x7f6f50002980 "tck-vm1-filter") at ../src/conf/virnwfilterobj.c:177
#5  0x00007f6f59de47ed in virNWFilterObjListFindInstantiateFilter (nwfilters=0x7f6f0801fe20, filtername=0x7f6f50002980 "tck-vm1-filter") at ../src/conf/virnwfilterobj.c:194
#6  0x00007f6f59b49fba in virNWFilterInstantiateFilterUpdate (driver=0x7f6f08021c50, teardownOld=true, binding=0x7f6f5000c5c0, ifindex=1237, useNewFilter=INSTANTIATE_ALWAYS, forceWithPendingReq=false, foundNewFilter=0x7f6f55673617) at ../src/nwfilter/nwfilter_gentech_driver.c:667
#7  0x00007f6f59b4a353 in virNWFilterInstantiateFilterInternal (driver=0x7f6f08021c50, binding=0x7f6f5000c5c0, teardownOld=true, useNewFilter=INSTANTIATE_ALWAYS, foundNewFilter=0x7f6f55673617) at ../src/nwfilter/nwfilter_gentech_driver.c:736
#8  0x00007f6f59b4a289 in virNWFilterInstantiateFilter (driver=0x7f6f08021c50, binding=0x7f6f5000c5c0) at ../src/nwfilter/nwfilter_gentech_driver.c:787
#9  0x00007f6f59b48ec9 in nwfilterBindingCreateXML (conn=0x7f6f08014590, xml=0x7f6f5000e490 "<filterbinding>\n  <owner>\n    <name>tck-vm1</name>\n    <uuid>24f6d329-50b1-4feb-a234-4d6193939498</uuid>\n  </owner>\n  <portdev name='tck-vm1-if0'/>\n  <mac address='52:54:00:a8:05:b9'/>\n  <filterref fi"..., flags=0) at ../src/nwfilter/nwfilter_driver.c:763
#10 0x00007f6f59ffd4e1 in virNWFilterBindingCreateXML (conn=0x7f6f08014590, xml=0x7f6f5000e490 "<filterbinding>\n  <owner>\n    <name>tck-vm1</name>\n    <uuid>24f6d329-50b1-4feb-a234-4d6193939498</uuid>\n  </owner>\n  <portdev name='tck-vm1-if0'/>\n  <mac address='52:54:00:a8:05:b9'/>\n  <filterref fi"..., flags=0) at ../src/libvirt-nwfilter.c:751
#11 0x00005575c6811413 in remoteDispatchNWFilterBindingCreateXML (server=0x5575c7e14880, client=0x5575c7e22520, msg=0x5575c7def440, rerr=0x7f6f55673940, args=0x7f6f50003420, ret=0x7f6f500027b0) at src/remote/remote_daemon_dispatch_stubs.h:16707
#12 0x00005575c67ebeca in remoteDispatchNWFilterBindingCreateXMLHelper (server=0x5575c7e14880, client=0x5575c7e22520, msg=0x5575c7def440, rerr=0x7f6f55673940, args=0x7f6f50003420, ret=0x7f6f500027b0) at src/remote/remote_daemon_dispatch_stubs.h:16688
#13 0x00007f6f59e1c574 in virNetServerProgramDispatchCall (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e22520, msg=0x5575c7def440) at ../src/rpc/virnetserverprogram.c:428
#14 0x00007f6f59e1c065 in virNetServerProgramDispatch (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e22520, msg=0x5575c7def440) at ../src/rpc/virnetserverprogram.c:302
#15 0x00007f6f59e278b1 in virNetServerProcessMsg (srv=0x5575c7e14880, client=0x5575c7e22520, prog=0x5575c7e16c10, msg=0x5575c7def440) at ../src/rpc/virnetserver.c:140
#16 0x00007f6f59e250c6 in virNetServerHandleJob (jobOpaque=0x5575c7e400a0, opaque=0x5575c7e14880) at ../src/rpc/virnetserver.c:160
#17 0x00007f6f59ced5a6 in virThreadPoolWorker (opaque=0x5575c7e125c0) at ../src/util/virthreadpool.c:164
#18 0x00007f6f59cebc00 in virThreadHelper (data=0x5575c7e125e0) at ../src/util/virthread.c:241
#19 0x00007f6f58edce1e in start_thread () at /lib64/libpthread.so.0
#20 0x00007f6f596c3a0f in clone () at /lib64/libc.so.6


Thread 2 (Thread 0x7f6f55e75640 (LWP 6100) "rpc-virtnwfilte"):
#0  0x00007f6f58ee62ab in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f6f58edf130 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x00007f6f59ceb675 in virMutexLock (m=0x7f6f59b65a60 <updateMutex>) at ../src/util/virthread.c:91
#3  0x00007f6f59b4a2e7 in virNWFilterInstantiateFilterInternal (driver=0x7f6f08021c50, binding=0x7f6f3c008d10, teardownOld=false, useNewFilter=INSTANTIATE_FOLLOW_NEWFILTER, foundNewFilter=0x7f6f55e74357) at ../src/nwfilter/nwfilter_gentech_driver.c:722
#4  0x00007f6f59b4a3cd in virNWFilterUpdateInstantiateFilter (driver=0x7f6f08021c50, binding=0x7f6f3c008d10, skipIface=0x7f6f55e743c7) at ../src/nwfilter/nwfilter_gentech_driver.c:801
#5  0x00007f6f59b4bd79 in virNWFilterBuildOne (driver=0x7f6f08021c50, binding=0x7f6f3c008d10, skipInterfaces=Python Exception <class 'gdb.error'>: There is no member named keys.
0x7f6f3400fcc0, step=0) at ../src/nwfilter/nwfilter_gentech_driver.c:923
#6  0x00007f6f59b4a694 in virNWFilterBuildIter (binding=0x7f6f2c00b870, opaque=0x7f6f55e74540) at ../src/nwfilter/nwfilter_gentech_driver.c:966
#7  0x00007f6f59de3e81 in virNWFilterBindingObjListHelper (payload=0x7f6f2c00b870, name=0x7f6f3c002e50 "tck-vm2-if0", opaque=0x7f6f55e744d0) at ../src/conf/virnwfilterbindingobjlist.c:346
#8  0x00007f6f59c799a2 in virHashForEachSafe (table=Python Exception <class 'gdb.error'>: There is no member named keys.
0x7f6f08017b60, iter=0x7f6f59de3e50 <virNWFilterBindingObjListHelper>, opaque=0x7f6f55e744d0) at ../src/util/virhash.c:392
#9  0x00007f6f59de3e19 in virNWFilterBindingObjListForEach (bindings=0x7f6f0801bc90, callback=0x7f6f59b4a650 <virNWFilterBuildIter>, opaque=0x7f6f55e74540) at ../src/conf/virnwfilterbindingobjlist.c:361
#10 0x00007f6f59b4a50d in virNWFilterBuildAll (driver=0x7f6f08021c50, newFilters=true) at ../src/nwfilter/nwfilter_gentech_driver.c:986
#11 0x00007f6f59b49a52 in virNWFilterTriggerRebuildImpl (opaque=0x7f6f08021c50) at ../src/nwfilter/nwfilter_driver.c:147
#12 0x00007f6f59ddabf6 in virNWFilterTriggerRebuild () at ../src/conf/nwfilter_conf.c:3104
#13 0x00007f6f59de4b70 in virNWFilterObjListAssignDef (nwfilters=0x7f6f0801fe20, def=0x7f6f480078e0) at ../src/conf/virnwfilterobj.c:356
#14 0x00007f6f59b4877d in nwfilterDefineXMLFlags (conn=0x7f6f44003ed0, xml=0x7f6f48024f30 "<filter name='tck-vm1-filter' chain='root'>\n  <uuid>464d2617-43d0-7694-b479-320b72dac187</uuid>\n  <filterref filter='clean-traffic'/>\n  <rule action='accept' direction='in' priority='500'>\n    <all co"..., flags=0) at ../src/nwfilter/nwfilter_driver.c:557
#15 0x00007f6f59b485df in nwfilterDefineXML (conn=0x7f6f44003ed0, xml=0x7f6f48024f30 "<filter name='tck-vm1-filter' chain='root'>\n  <uuid>464d2617-43d0-7694-b479-320b72dac187</uuid>\n  <filterref filter='clean-traffic'/>\n  <rule action='accept' direction='in' priority='500'>\n    <all co"...) at ../src/nwfilter/nwfilter_driver.c:584
#16 0x00007f6f59ffc498 in virNWFilterDefineXML (conn=0x7f6f44003ed0, xmlDesc=0x7f6f48024f30 "<filter name='tck-vm1-filter' chain='root'>\n  <uuid>464d2617-43d0-7694-b479-320b72dac187</uuid>\n  <filterref filter='clean-traffic'/>\n  <rule action='accept' direction='in' priority='500'>\n    <all co"...) at ../src/libvirt-nwfilter.c:398
#17 0x00005575c67ff0fc in remoteDispatchNWFilterDefineXML (server=0x5575c7e14880, client=0x5575c7e22300, msg=0x5575c7e216d0, rerr=0x7f6f55e74940, args=0x7f6f48022080, ret=0x7f6f4801d3f0) at src/remote/remote_daemon_dispatch_stubs.h:16925
#18 0x00005575c67e24ea in remoteDispatchNWFilterDefineXMLHelper (server=0x5575c7e14880, client=0x5575c7e22300, msg=0x5575c7e216d0, rerr=0x7f6f55e74940, args=0x7f6f48022080, ret=0x7f6f4801d3f0) at src/remote/remote_daemon_dispatch_stubs.h:16906
#19 0x00007f6f59e1c574 in virNetServerProgramDispatchCall (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e22300, msg=0x5575c7e216d0) at ../src/rpc/virnetserverprogram.c:428
#20 0x00007f6f59e1c065 in virNetServerProgramDispatch (prog=0x5575c7e16c10, server=0x5575c7e14880, client=0x5575c7e22300, msg=0x5575c7e216d0) at ../src/rpc/virnetserverprogram.c:302
#21 0x00007f6f59e278b1 in virNetServerProcessMsg (srv=0x5575c7e14880, client=0x5575c7e22300, prog=0x5575c7e16c10, msg=0x5575c7e216d0) at ../src/rpc/virnetserver.c:140
#22 0x00007f6f59e250c6 in virNetServerHandleJob (jobOpaque=0x5575c7e376d0, opaque=0x5575c7e14880) at ../src/rpc/virnetserver.c:160
#23 0x00007f6f59ced5a6 in virThreadPoolWorker (opaque=0x5575c7e0f270) at ../src/util/virthreadpool.c:164
#24 0x00007f6f59cebc00 in virThreadHelper (data=0x5575c7e12b80) at ../src/util/virthread.c:241
#25 0x00007f6f58edce1e in start_thread () at /lib64/libpthread.so.0
#26 0x00007f6f596c3a0f in clone () at /lib64/libc.so.6


Here, Thread 6 is trying to acquire driver lock which is held by Thread 2, which happens before any other lock is acquired. Thus Thread 6 can be left out.
Thread 9 doesn't hold any locks, but is trying to lock driver->bindings for write so that it can remove virNWFilterBindingObj(filter="tck-vm2-filter"). This object is unrelated.
That leaves us with the following:

Thread 3 holds nwfilter_gentech_driver.c:updateMutex and wants to lock virNWFilterObj(name="tck-vm1-filter") but this object is locked by Thread 2, which is in turn trying to lock the updateMutex (in virNWFilterInstantiateFilterInternal()) because it just defined an NWFilter and wants to instantiate it. Thus, we need to make sure virNWFilterDefineXML (Thread 2) and virNWFilterBindingCreateXML (Thread 3) don't run at the same time. Or, when they do the lock ordering must be preserved (but what's the correct order and what locks to consider is unclear at the moment).

Comment 3 Daniel Berrangé 2022-02-01 12:13:25 UTC
This all looks like a pre-existing problem self-contained within the nwfilter driver, and should exhibit deadlock regardless of whether run in the modular daemon or monolithic  daemon.

Comment 4 Michal Privoznik 2022-02-01 12:36:31 UTC
Agreed, but maybe it's easier to trigger in split daemon because all those virNWFilterReadLockFilterUpdates() called from qemu driver apply to nwfilter driver too (in monolithic scenario), but not in split daemon? I mean, in the reproducer  there's one process doing virsh create && virsh destroy in a loop. So maybe that monolithic daemon doesn't suffer from this problem (by pure luck).

Comment 5 Daniel Berrangé 2022-02-01 12:43:55 UTC
(In reply to Michal Privoznik from comment #4)
> Agreed, but maybe it's easier to trigger in split daemon because all those
> virNWFilterReadLockFilterUpdates() called from qemu driver apply to nwfilter
> driver too (in monolithic scenario), but not in split daemon? I mean, in the
> reproducer  there's one process doing virsh create && virsh destroy in a
> loop. So maybe that monolithic daemon doesn't suffer from this problem (by
> pure luck).

Oh I totally missed that, and its indeed probably the core of the problem. In the monolithic daemon we've got the QEMU driver getting blocked from triggering more work in the nwfilter driver via virNWFilterReadLockFilterUpdates, but that can't happen in the modular daemon case, because each daemon has its own private copy of this mutex.

For modular daemons to work there cannot be any back flow where a secondary driver can affect the primary virt drivers in this way, as it essentially creates a mutual dependancy.

So we need to figure out a way to eliminate the virNWFilterReadLockFilterUpdates method entirely and have nwfilter driver "do the right thing" in a self contained manner. From experiance fighting the nwfilter driver locking, this is going to be painful to work out a solution for :-(

Comment 7 Daniel Berrangé 2022-02-25 13:45:47 UTC
Created attachment 1863314 [details]
Go demo to reproduce hang

Comment 8 Daniel Berrangé 2022-02-25 13:54:32 UTC
This attached race.go  demo program is more reliable at reproducing than the earlier shell demo, as it has more concurrency present and directly uses the virNWFilterBinding APIs instead of creating VMs.

As preparation you need to create 10 tap devices

for i in `seq 0 9` ; do ip tuntap add dev race$i mode tap ; done
for i in `seq 0 9` ; do ip link set dev race$i address  fe:54:00:e8:92:0$i ; done
for i in `seq 0 9` ; do brctl addif demo race$i ; done


Then to build the demo

$ cat > go.mod <<EOF 
module libvirt.org/nwfilterrace

go 1.16

require (
	github.com/google/uuid v1.3.0 // indirect
	libvirt.org/go/libvirt v1.8000.0 // indirect
	libvirt.org/go/libvirtxml v1.8000.0 // indirect
)
EOF
$ go build

Start either libvirtd or virtnwfilterd as root, and run  ./nwfilterrace

It'll print characters as it progresses, and it'll quickly stop indicating the deadlock

The deadlock actually occurs in both libvirtd and virtnwfilterd, if you are calling virNWFilterBindingCreateXML directly as this demo does.

If you calling virDomainCreateXML instead which indirectly calls virNWFilterBindingCreateXML then libvirtd is safe, due to cross-driver locking, but virtnwfilterd will still hang.


I've got a patch to fix the deadlock, but this then reveals a second long standing problem - we have writer starvation in the RWLock used to serialize nwfilter access - a frequent create/delete of nwfilter bindings can indefinitely block nwfilter define/undefine.

Comment 9 Daniel Berrangé 2022-02-28 11:07:53 UTC
Patches to fix the deadlock are at:

  https://listman.redhat.com/archives/libvir-list/2022-February/msg00888.html

NB, this doesn't fix the fairness problem leading to starvation, but that aspect isn't a regression.

Comment 10 Daniel Berrangé 2022-03-08 17:54:20 UTC
Merged upstream as

commit 65dc79f50b96b34b2253601b8972d5ca90658f33
Author: Daniel P. Berrangé <berrange>
Date:   Thu Feb 24 18:41:29 2022 +0000

    nwfilter: hold filter update lock when creating/deleting bindings
    
    The nwfilter update lock is historically acquired by the virt
    drivers in order to achieve serialization between nwfilter
    define/undefine, and instantiation/teardown of filters.
    
    When running in the modular daemons, however, the mutex that
    the virt drivers are locking is in a completely different
    process from the mutex that the nwfilter driver is locking.
    
    Serialization is lost and thus call from the virt driver to
    virNWFilterBindingCreateXML can deadlock with a concurrent
    call to the virNWFilterDefineXML method.
    
    The solution is surprisingly easy, the update lock simply
    needs acquiring in the virNWFilterBindingCreateXML method
    and virNWFilterBindingUndefine method instead of in the
    virt drivers.
    
    The only semantic difference here is that when a virtual
    machine has multiple NICs, the instantiation and teardown
    of filters is no longer serialized for the whole VM, but
    rather for each NIC. This should not be a problem since
    the virt drivers already need to cope with tearing down
    a partially created VM where only some of the NICs are
    setup.
    
    Reviewed-by: Laine Stump <laine>
    Signed-off-by: Daniel P. Berrangé <berrange>


commit 5f8b090f421cd6a6c46f44905431491e2d3cf8f5
Author: Daniel P. Berrangé <berrange>
Date:   Thu Feb 24 19:02:32 2022 +0000

    qemu,lxc: remove use to nwfilter update lock
    
    Now that the virNWFilterBinding APIs are using the nwfilter
    update lock directly, there is no need for the virt drivers
    to do it themselves.
    
    Reviewed-by: Laine Stump <laine>
    Signed-off-by: Daniel P. Berrangé <berrange>


These patches are self contained so should be easy to cherry-pick into RHEL 9.0 tree.

Comment 20 yafu 2022-03-24 03:35:37 UTC
Reproduced with libvirt-8.0.0-5.el9.x86_64.

Using the scripts provided by Daniel in comment #8:
1.Download race.go in the attachment;

2.Create 10 tap devices:
# brctl addbr demo
# for i in `seq 0 9` ; do ip tuntap add dev race$i mode tap ; done
# for i in `seq 0 9` ; do ip link set dev race$i address  fe:54:00:e8:92:0$i ; done
# for i in `seq 0 9` ; do brctl addif demo race$i ; done

3.Build the demo:
#cat > go.mod <<EOF 
module libvirt.org/nwfilterrace

go 1.16

require (
	github.com/google/uuid v1.3.0 // indirect
	libvirt.org/go/libvirt v1.8000.0 // indirect
	libvirt.org/go/libvirtxml v1.8000.0 // indirect
)
EOF
# go build

4.Execute the script:
#./nwfilterrace

5.Open another terminal and execute virsh nwfilter-list
#virsh nwfilter-list
hang

6.Check the backtrace:
(gdb) t a a bt

Thread 19 (Thread 0x7f0f457fa640 (LWP 2319) "rpc-virtnwfilte"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x563826429f90, m=m@entry=0x563826429f68) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b493 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 18 (Thread 0x7f0f23fff640 (LWP 2040) "gdbus"):
#0  0x00007f0f804a3e9f in poll () from /lib64/libc.so.6
#1  0x00007f0f8088d59c in g_main_context_poll (priority=<optimized out>, n_fds=2, fds=0x7f0f24021b80, timeout=<optimized out>, context=0x7f0f2401e9b0) at ../glib/gmain.c:4434
#2  g_main_context_iterate.constprop.0 (context=0x7f0f2401e9b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126
#3  0x00007f0f80838463 in g_main_loop_run (loop=0x7f0f2401eaa0) at ../glib/gmain.c:4329
#4  0x00007f0f806c45ca in gdbus_shared_thread_func (user_data=0x7f0f2401e980) at ../gio/gdbusprivate.c:280
#5  0x00007f0f80863d72 in g_thread_proxy (data=0x7f0f24019800) at ../glib/gthread.c:826
#6  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 17 (Thread 0x7f0f44ff9640 (LWP 2039) "gmain"):
#0  0x00007f0f804a3e9f in poll () from /lib64/libc.so.6
#1  0x00007f0f8088d59c in g_main_context_poll (priority=<optimized out>, n_fds=1, fds=0x7f0f24009e10, timeout=<optimized out>, context=0x7f0f2400d360) at ../glib/gmain.c:4434
#2  g_main_context_iterate.constprop.0 (context=context@entry=0x7f0f2400d360, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126
#3  0x00007f0f808365d3 in g_main_context_iteration (context=0x7f0f2400d360, may_block=may_block@entry=1) at ../glib/gmain.c:4196
#4  0x00007f0f80836621 in glib_worker_main (data=<optimized out>) at ../glib/gmain.c:6089
#5  0x00007f0f80863d72 in g_thread_proxy (data=0x7f0f24009400) at ../glib/gthread.c:826
#6  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#7  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 16 (Thread 0x7f0f45ffb640 (LWP 2037) "rpc-admin"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642ffa0, m=m@entry=0x56382642ff78) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b493 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
--Type <RET> for more, q to quit, c to continue without paging--
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 15 (Thread 0x7f0f467fc640 (LWP 2036) "rpc-admin"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642ffa0, m=m@entry=0x56382642ff78) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b493 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 14 (Thread 0x7f0f46ffd640 (LWP 2035) "rpc-admin"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642ffa0, m=m@entry=0x56382642ff78) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b493 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
--Type <RET> for more, q to quit, c to continue without paging--
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 13 (Thread 0x7f0f477fe640 (LWP 2034) "rpc-admin"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642ffa0, m=m@entry=0x56382642ff78) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b493 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 12 (Thread 0x7f0f47fff640 (LWP 2033) "rpc-admin"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642ffa0, m=m@entry=0x56382642ff78) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b493 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 11 (Thread 0x7f0f68ff9640 (LWP 2032) "prio-rpc-virtnw"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642a030, m=m@entry=0x563826429f68) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b460 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 10 (Thread 0x7f0f697fa640 (LWP 2031) "prio-rpc-virtnw"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642a030, m=m@entry=0x563826429f68) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b460 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6


--Type <RET> for more, q to quit, c to continue without paging--

Thread 9 (Thread 0x7f0f69ffb640 (LWP 2030) "prio-rpc-virtnw"):
#0  0x00007f0f803fd450 in __lll_lock_wait () from /lib64/libc.so.6
#1  0x00007f0f80403b12 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007f0f80a52419 in virMutexLock (m=<optimized out>) at ../src/util/virthread.c:91
#3  0x00007f0f702768da in nwfilterDriverLock () at ../src/nwfilter/nwfilter_driver.c:62
#4  nwfilterConnectListAllNWFilters (conn=0x7f0f24016690, nwfilters=0x7f0f69ffa8d0, flags=0) at ../src/nwfilter/nwfilter_driver.c:520
#5  0x00007f0f80c3260b in virConnectListAllNWFilters (conn=0x7f0f24016690, filters=0x7f0f69ffa8d0, flags=0) at ../src/libvirt-nwfilter.c:98
#6  0x00005638253fc6e7 in remoteDispatchConnectListAllNWFilters (server=<optimized out>, msg=<optimized out>, ret=0x7f0f48004650, args=0x7f0f48000ba0, rerr=0x7f0f69ffa9a0, client=0x56382643a1f0) at src/remote/remote_daemon_dispatch_stubs.h:1775
#7  remoteDispatchConnectListAllNWFiltersHelper (server=<optimized out>, client=0x56382643a1f0, msg=<optimized out>, rerr=0x7f0f69ffa9a0, args=0x7f0f48000ba0, ret=0x7f0f48004650) at src/remote/remote_daemon_dispatch_stubs.h:1753
#8  0x00007f0f80b191d6 in virNetServerProgramDispatchCall (msg=0x56382643c6d0, client=0x56382643a1f0, server=0x56382642e080, prog=0x56382642f810) at ../src/rpc/virnetserverprogram.c:428
#9  virNetServerProgramDispatch (prog=0x56382642f810, server=0x56382642e080, client=0x56382643a1f0, msg=0x56382643c6d0) at ../src/rpc/virnetserverprogram.c:302
#10 0x00007f0f80b1fab8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimiz--Type <RET> for more, q to quit, c to continue without paging--
ed out>, srv=0x56382642e080) at ../src/rpc/virnetserver.c:140
#11 virNetServerHandleJob (jobOpaque=0x56382643be00, opaque=0x56382642e080) at ../src/rpc/virnetserver.c:160
#12 0x00007f0f80a5b3d3 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
#13 0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#14 0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#15 0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 8 (Thread 0x7f0f6a7fc640 (LWP 2029) "prio-rpc-virtnw"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642a030, m=m@entry=0x563826429f68) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b460 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 7 (Thread 0x7f0f6affd640 (LWP 2028) "prio-rpc-virtnw"):
#0  0x00007f0f803fd2ea in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f803ffa60 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x00007f0f80a5250b in virCondWait (c=c@entry=0x56382642a030, m=m@entry=0x563826429f68) at ../src/util/virthread.c:156
#3  0x00007f0f80a5b460 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:121
#4  0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#5  0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#6  0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0f6b7fe640 (LWP 2027) "rpc-virtnwfilte"):
#0  0x00007f0f803fd450 in __lll_lock_wait () from /lib64/libc.so.6
#1  0x00007f0f80403b6d in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007f0f80a52419 in virMutexLock (m=<optimized out>) at ../src/util/virthread.c:91
#3  0x00007f0f7028520f in virNWFilterInstantiateFilterInternal.constprop.0 (driver=driver@entry=0x7f0f24021fa0, binding=binding@entry=0x7f0f640077e0, teardownOld=teardownOld@entry=false, useNewFilter=useNewFilter@entry=INSTANTIATE_FOLLOW_NEWFILTER, foundNewFilter=foundNewFilter@entry=0x7f0f6b7fd6f7) at ../src/nwfilter/nwfilter_gentech_driver.c:722
#4  0x00007f0f7027afc2 in virNWFilterUpdateInstantiateFilter (driver=driver@entry=0x7f0f24021fa0, binding=binding@entry=0x7f0f640077e0, skipIface=skipIface@entry=0x7f0f6b7fd714) at ../src/nwfilter/nwfilter_gentech_driver.c:801
#5  0x00007f0f70280330 in virNWFilterBuildOne (step=0, skipInterfaces=0x7f0f5c002520, binding=0x7f0f640077e0, driver=0x7f0f24021fa0) at ../src/nwfilter/nwfilter_gentech_driver.c:923
#6  virNWFilterBuildIter (binding=<optimized out>, opaque=<optimized out>) at ../src/nwfilter/nwfilter_gente--Type <RET> for more, q to quit, c to continue without paging--
ch_driver.c:966
#7  0x00007f0f80af331e in virNWFilterBindingObjListHelper () from /lib64/libvirt.so.0
#8  0x00007f0f80a14d5d in virHashForEachSafe (table=<optimized out>, iter=0x7f0f80af3310 <virNWFilterBindingObjListHelper>, opaque=0x7f0f6b7fd790) at ../src/util/virhash.c:392
#9  0x00007f0f80af9ab5 in virNWFilterBindingObjListForEach (bindings=0x7f0f2401dc90, callback=callback@entry=0x7f0f70280220 <virNWFilterBuildIter>, opaque=opaque@entry=0x7f0f6b7fd7d0) at ../src/conf/virnwfilterbindingobjlist.c:361
#10 0x00007f0f7027b090 in virNWFilterBuildAll (driver=0x7f0f24021fa0, newFilters=<optimized out>) at ../src/nwfilter/nwfilter_gentech_driver.c:986
#11 0x00007f0f80af9fce in virNWFilterObjListAssignDef (nwfilters=<optimized out>, def=def@entry=0x7f0f5c015220) at ../src/conf/virnwfilterobj.c:356
#12 0x00007f0f70276a06 in nwfilterDefineXMLFlags (conn=0x7f0f24016190, xml=0x7f0f5c00c670 "<filter name=\"filterrace0\" chain=\"root\">\n  <uuid>254b92a4-4f6f-4469-95a2-a8028d961826</uuid>\n  <filterref filter=\"clean-traffic\"></filterref>\n  <rule action=\"accept\" direction=\"in\" priority=\"500\">\n   "..., flags=<optimized out>) at ../src/nwfilter/nwfilter_driver.c:557
#13 0x00007f0f80c3632c in virNWFilterDefineXML (conn=0x7f0f24016190, xmlDesc=0x7f0f5c00c670 "<filter name=\"filterrace0\" chain=\"root\">\n  <uuid>254b92a4-4f6f-4469-95a2-a8028d961826</uuid>\n  <filterref filter=\"clean-traffic\"></filterref>\n  <rule action=\"accept\" direction=\"in\" priority=\"500\">\n   "...) at ../src/libvirt-nwfilter.c:398
#14 0x000056382541d296 in remoteDispatchNWFilterDefineXML (server=0x56382642e080, msg=0x56382643b300, ret=0x7f0f5c0017b0, args=0x7f0f5c001790, rerr=0x7f0f6b7fd9a0, client=<optimized out>) at src/remote/remote_daemon_--Type <RET> for more, q to quit, c to continue without paging--
dispatch_stubs.h:16925
#15 remoteDispatchNWFilterDefineXMLHelper (server=0x56382642e080, client=<optimized out>, msg=0x56382643b300, rerr=0x7f0f6b7fd9a0, args=0x7f0f5c001790, ret=0x7f0f5c0017b0) at src/remote/remote_daemon_dispatch_stubs.h:16906
#16 0x00007f0f80b191d6 in virNetServerProgramDispatchCall (msg=0x56382643b300, client=0x56382643a0e0, server=0x56382642e080, prog=0x56382642f810) at ../src/rpc/virnetserverprogram.c:428
#17 virNetServerProgramDispatch (prog=0x56382642f810, server=0x56382642e080, client=0x56382643a0e0, msg=0x56382643b300) at ../src/rpc/virnetserverprogram.c:302
#18 0x00007f0f80b1fab8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x56382642e080) at ../src/rpc/virnetserver.c:140
#19 virNetServerHandleJob (jobOpaque=0x56382643cbe0, opaque=0x56382642e080) at ../src/rpc/virnetserver.c:160
#20 0x00007f0f80a5b3d3 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
#21 0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#22 0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#23 0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6




Thread 5 (Thread 0x7f0f6bfff640 (LWP 2026) "rpc-virtnwfilte"):
#0  0x00007f0f803fd450 in __lll_lock_wait () from /lib64/libc.so.6
#1  0x00007f0f80403b12 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007f0f80a52419 in virMutexLock (m=<optimized out>) at ../src/util/virthread.c:91
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x00007f0f702769c8 in nwfilterDriverLock () at ../src/nwfilter/nwfilter_driver.c:62
#4  nwfilterDefineXMLFlags (conn=0x7f0f24016190, xml=0x7f0f58005820 "<filter name=\"filterrace6\" chain=\"root\">\n  <uuid>d37fa0c3-f1fd-469e-bb9f-479e1911d0bf</uuid>\n  <filterref filter=\"clean-traffic\"></filterref>\n  <rule action=\"accept\" direction=\"in\" priority=\"500\">\n   "..., flags=0) at ../src/nwfilter/nwfilter_driver.c:548
#5  0x00007f0f80c3632c in virNWFilterDefineXML (conn=0x7f0f24016190, xmlDesc=0x7f0f58005820 "<filter name=\"filterrace6\" chain=\"root\">\n  <uuid>d37fa0c3-f1fd-469e-bb9f-479e1911d0bf</uuid>\n  <filterref filter=\"clean-traffic\"></filterref>\n  <rule action=\"accept\" direction=\"in\" priority=\"500\">\n   "...) at ../src/libvirt-nwfilter.c:398
#6  0x000056382541d296 in remoteDispatchNWFilterDefineXML (server=0x56382642e080, msg=0x56382643da60, ret=0x7f0f58001df0, args=0x7f0f58001dd0, rerr=0x7f0f6bffe9a0, client=<optimized out>) at src/remote/remote_daemon_dispatch_stubs.h:16925
#7  remoteDispatchNWFilterDefineXMLHelper (server=0x56382642e080, client=<optimized out>, msg=0x56382643da60, rerr=0x7f0f6bffe9a0, args=0x7f0f58001dd0, ret=0x7f0f58001df0) at src/remote/remote_daemon_dispatch_stubs.h:16906
#8  0x00007f0f80b191d6 in virNetServerProgramDispatchCall (msg=0x56382643da60, client=0x56382643a0e0, server=0x56382642e080, prog=0x56382642f810) at ../src/rpc/virnetserverprogram.c:428
#9  virNetServerProgramDispatch (prog=0x56382642f810, server=0x56382642e080, client=0x56382643a0e0, msg=0x56382643da60) at ../src/rpc/virnetserverprogram.c:302
#10 0x00007f0f80b1fab8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x56382642e080) at ../src/rpc/virnetserver.c:140
--Type <RET> for more, q to quit, c to continue without paging--
#11 virNetServerHandleJob (jobOpaque=0x56382643b920, opaque=0x56382642e080) at ../src/rpc/virnetserver.c:160
#12 0x00007f0f80a5b3d3 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
#13 0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#14 0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#15 0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6



Thread 4 (Thread 0x7f0f70a8f640 (LWP 2025) "rpc-virtnwfilte"):
#0  0x00007f0f803fd450 in __lll_lock_wait () from /lib64/libc.so.6
#1  0x00007f0f80403b12 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007f0f80a52419 in virMutexLock (m=<optimized out>) at ../src/util/virthread.c:91
#3  0x00007f0f702769c8 in nwfilterDriverLock () at ../src/nwfilter/nwfilter_driver.c:62
#4  nwfilterDefineXMLFlags (conn=0x7f0f24016190, xml=0x7f0f6000d5c0 "<filter name=\"filterrace8\" chain=\"root\">\n  <uuid>c1605c89-adb6-4429-b49c-9a3afa8ba760</uuid>\n  <filterref filter=\"clean-traffic\"></filterref>\n  <rule action=\"accept\" direction=\"in\" priority=\"500\">\n   "..., flags=0) at ../src/nwfilter/nwfilter_driver.c:548
#5  0x00007f0f80c3632c in virNWFilterDefineXML (conn=0x7f0f24016190, xmlDesc=0x7f0f6000d5c0 "<filter name=\"filterrace8\" chain=\"root\">\n  <uuid>c1605c89-adb6-4429-b49c-9a3afa8ba760</uuid>\n  <filterref filter=\"clean-traffic\"></filterref>\n  <rule action=\"accept\" direction=\"in\" priority=\"500\">\n   "...) at ../src/libvirt-nwfilter.c:398
#6  0x000056382541d296 in remoteDispatchNWFilterDefineXML (server=0x56382642e080, msg=0x56382643d690, ret=0x--Type <RET> for more, q to quit, c to continue without paging--
7f0f600027a0, args=0x7f0f60002780, rerr=0x7f0f70a8e9a0, client=<optimized out>) at src/remote/remote_daemon_dispatch_stubs.h:16925
#7  remoteDispatchNWFilterDefineXMLHelper (server=0x56382642e080, client=<optimized out>, msg=0x56382643d690, rerr=0x7f0f70a8e9a0, args=0x7f0f60002780, ret=0x7f0f600027a0) at src/remote/remote_daemon_dispatch_stubs.h:16906
#8  0x00007f0f80b191d6 in virNetServerProgramDispatchCall (msg=0x56382643d690, client=0x56382643a0e0, server=0x56382642e080, prog=0x56382642f810) at ../src/rpc/virnetserverprogram.c:428
#9  virNetServerProgramDispatch (prog=0x56382642f810, server=0x56382642e080, client=0x56382643a0e0, msg=0x56382643d690) at ../src/rpc/virnetserverprogram.c:302
#10 0x00007f0f80b1fab8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x56382642e080) at ../src/rpc/virnetserver.c:140
#11 virNetServerHandleJob (jobOpaque=0x563826425310, opaque=0x56382642e080) at ../src/rpc/virnetserver.c:160
#12 0x00007f0f80a5b3d3 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
#13 0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#14 0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#15 0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 3 (Thread 0x7f0f71290640 (LWP 2024) "rpc-virtnwfilte"):
#0  0x00007f0f803fd269 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f0f80406d35 in pthread_rwlock_wrlock.5 () from /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x00007f0f80af9638 in virNWFilterBindingObjListAdd (bindings=0x7f0f2401dc90, def=def@entry=0x7f0f64019110) at ../src/conf/virnwfilterbindingobjlist.c:207
#3  0x00007f0f7027ae1c in nwfilterBindingCreateXML (conn=0x7f0f24016190, xml=<optimized out>, flags=<optimized out>) at ../src/nwfilter/nwfilter_driver.c:755
#4  0x00007f0f80c36988 in virNWFilterBindingCreateXML (conn=0x7f0f24016190, xml=0x7f0f6401ea50 "<filterbinding>\n  <owner>\n    <uuid>4b8d433a-2299-4f2a-8daf-71184af314f0</uuid>\n    <name>race7</name>\n  </owner>\n  <portdev name=\"race7\"></portdev>\n  <mac address=\"fe:54:00:e8:92:07\"></mac>\n  <filter"..., flags=0) at ../src/libvirt-nwfilter.c:751
#5  0x000056382541ce2a in remoteDispatchNWFilterBindingCreateXML (server=0x56382642e080, msg=0x56382643b440, ret=0x7f0f64012450, args=0x7f0f6400b710, rerr=0x7f0f7128f9a0, client=<optimized out>) at src/remote/remote_daemon_dispatch_stubs.h:16707
#6  remoteDispatchNWFilterBindingCreateXMLHelper (server=0x56382642e080, client=<optimized out>, msg=0x56382643b440, rerr=0x7f0f7128f9a0, args=0x7f0f6400b710, ret=0x7f0f64012450) at src/remote/remote_daemon_dispatch_stubs.h:16688
#7  0x00007f0f80b191d6 in virNetServerProgramDispatchCall (msg=0x56382643b440, client=0x56382643a0e0, server=0x56382642e080, prog=0x56382642f810) at ../src/rpc/virnetserverprogram.c:428
#8  virNetServerProgramDispatch (prog=0x56382642f810, server=0x56382642e080, client=0x56382643a0e0, msg=0x56382643b440) at ../src/rpc/virnetserverprogram.c:302
#9  0x00007f0f80b1fab8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x56382642e080) at ../src/rpc/virnetserver.c:140
#10 virNetServerHandleJob (jobOpaque=0x563826427890, opaque=0x56382642e080) at ../src/rpc/virnetserver.c:160--Type <RET> for more, q to quit, c to continue without paging--

#11 0x00007f0f80a5b3d3 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
#12 0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#13 0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#14 0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6


Thread 2 (Thread 0x7f0f71a91640 (LWP 2023) "rpc-virtnwfilte"):
#0  0x00007f0f803fd450 in __lll_lock_wait () from /lib64/libc.so.6
#1  0x00007f0f80403b6d in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007f0f80af4d33 in virNWFilterObjListFindByName (nwfilters=0x7f0f24021e90, name=name@entry=0x7f0f6c0093e0 "filterrace5") at ../src/conf/virnwfilterobj.c:177
#3  0x00007f0f80af4d83 in virNWFilterObjListFindInstantiateFilter (nwfilters=<optimized out>, filtername=0x7f0f6c0093e0 "filterrace5") at ../src/conf/virnwfilterobj.c:194
#4  0x00007f0f70280767 in virNWFilterInstantiateFilterUpdate (driver=0x7f0f24021fa0, teardownOld=<optimized out>, binding=0x7f0f6c008300, ifindex=12, useNewFilter=INSTANTIATE_ALWAYS, forceWithPendingReq=<optimized out>, foundNewFilter=0x7f0f71a90867) at ../src/nwfilter/nwfilter_gentech_driver.c:667
#5  0x00007f0f7028524f in virNWFilterInstantiateFilterInternal.constprop.0 (driver=0x7f0f24021fa0, binding=binding@entry=0x7f0f6c008300, teardownOld=teardownOld@entry=true, useNewFilter=useNewFilter@entry=INSTANTIATE_ALWAYS, foundNewFilter=foundNewFilter@entry=0x7f0f71a90867) at ../src/nwfilter/nwfilter_gentech_driver.c:736
#6  0x00007f0f7027ad7e in virNWFilterInstantiateFilter (driver=<optimized out>, binding=binding@entry=0x7f0f--Type <RET> for more, q to quit, c to continue without paging--
6c008300) at ../src/nwfilter/nwfilter_gentech_driver.c:787
#7  0x00007f0f7027ae53 in nwfilterBindingCreateXML (conn=0x7f0f24016190, xml=<optimized out>, flags=<optimized out>) at ../src/nwfilter/nwfilter_driver.c:763
#8  0x00007f0f80c36988 in virNWFilterBindingCreateXML (conn=0x7f0f24016190, xml=0x7f0f6c006c20 "<filterbinding>\n  <owner>\n    <uuid>c14bbaa8-c50a-416c-a258-32ea9b2849f4</uuid>\n    <name>race5</name>\n  </owner>\n  <portdev name=\"race5\"></portdev>\n  <mac address=\"fe:54:00:e8:92:05\"></mac>\n  <filter"..., flags=0) at ../src/libvirt-nwfilter.c:751
#9  0x000056382541ce2a in remoteDispatchNWFilterBindingCreateXML (server=0x56382642e080, msg=0x56382643d2c0, ret=0x7f0f6c002b80, args=0x7f0f6c000b80, rerr=0x7f0f71a909a0, client=<optimized out>) at src/remote/remote_daemon_dispatch_stubs.h:16707
#10 remoteDispatchNWFilterBindingCreateXMLHelper (server=0x56382642e080, client=<optimized out>, msg=0x56382643d2c0, rerr=0x7f0f71a909a0, args=0x7f0f6c000b80, ret=0x7f0f6c002b80) at src/remote/remote_daemon_dispatch_stubs.h:16688
#11 0x00007f0f80b191d6 in virNetServerProgramDispatchCall (msg=0x56382643d2c0, client=0x56382643a0e0, server=0x56382642e080, prog=0x56382642f810) at ../src/rpc/virnetserverprogram.c:428
#12 virNetServerProgramDispatch (prog=0x56382642f810, server=0x56382642e080, client=0x56382643a0e0, msg=0x56382643d2c0) at ../src/rpc/virnetserverprogram.c:302
#13 0x00007f0f80b1fab8 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x56382642e080) at ../src/rpc/virnetserver.c:140
#14 virNetServerHandleJob (jobOpaque=0x56382642db00, opaque=0x56382642e080) at ../src/rpc/virnetserver.c:160
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x00007f0f80a5b3d3 in virThreadPoolWorker (opaque=<optimized out>) at ../src/util/virthreadpool.c:164
#16 0x00007f0f80a59f29 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:241
#17 0x00007f0f804006ca in start_thread () from /lib64/libc.so.6
#18 0x00007f0f803a03f0 in clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7f0f7efc2ac0 (LWP 2022) "virtnwfilterd"):
#0  0x00007f0f804a3e9f in poll () from /lib64/libc.so.6
#1  0x00007f0f8088d59c in g_main_context_poll (priority=<optimized out>, n_fds=8, fds=0x563826432600, timeout=<optimized out>, context=0x563826432680) at ../glib/gmain.c:4434
#2  g_main_context_iterate.constprop.0 (context=context@entry=0x563826432680, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126
#3  0x00007f0f808365d3 in g_main_context_iteration (context=0x563826432680, context@entry=0x0, may_block=may_block@entry=1) at ../glib/gmain.c:4196
#4  0x00007f0f809fe7d4 in virEventGLibRunOnce () at ../src/util/vireventglib.c:516
#5  0x00007f0f80b1db0d in virNetDaemonRun (dmn=0x56382642d020) at ../src/rpc/virnetdaemon.c:850
#6  0x00005638253f639c in main (argc=<optimized out>, argv=<optimized out>) at ../src/remote/remote_daemon.c:1222

Comment 21 yafu 2022-03-24 03:37:40 UTC
Verified with libvirt-8.0.0-7.el9_0.x86_64.

Test steps are the same with comment 20 and virtnwfilterd did not hang in step 5.

Comment 24 errata-xmlrpc 2022-05-17 12:46:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: libvirt), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2390