Bug 2075837
| Summary: | virtnwfilterd crashed when start->reload->restart virtnwfilterd with running guest having filter setting | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | yafu <yafu> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| libvirt sub component: | General | QA Contact: | yalzhang <yalzhang> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | jdenemar, jsuchane, lmen, mprivozn, virt-maint, xuzhang |
| Version: | 9.1 | Keywords: | Triaged, Upstream |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-8.4.0-1.el9 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-15 10:04:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | 8.4.0 |
| Embargoed: | |||
Daniel, would you please look into this one? Thanks. Current master has slightly different line numbers, but the offending code is this:
static int
nwfilterStateReload(void)
{
if (!driver)
return -1;
if (!driver->privileged)
return 0;
virNWFilterDHCPSnoopEnd(NULL);
/* shut down all threads -- they will be restarted if necessary */
virNWFilterLearnThreadsTerminate(true);
VIR_WITH_MUTEX_LOCK_GUARD(&driverMutex) {
VIR_WITH_MUTEX_LOCK_GUARD(&driver->updateLock) { /* THIS is the line that causes SIGSEGV */
virNWFilterObjListLoadAllConfigs(driver->nwfilters, driver->configDir);
}
virNWFilterBuildAll(driver, false);
}
return 0;
}
And I think I know where the problem is. The problematic line is the one that locks driver->updateLock because from the stack trace, driver is NULL at that point (indeed, updateLock member is at offset 30). And the problem is that after driverMutex lock was acquired we have to check again whether driver == NULL. It's not sufficient that this condition was checked at the beginning of the function, because that was done outside of the lock that guards the driver variable.
Patch posted onto the list: https://listman.redhat.com/archives/libvir-list/2022-April/230337.html v3 (although patch is v2 because it's implementing different approach): https://listman.redhat.com/archives/libvir-list/2022-May/231663.html Merged upstream as:
commit a54391fbc268e19252f5a6b2d91a6f8d84990243
Author: Michal Prívozník <mprivozn>
AuthorDate: Mon Apr 25 10:31:56 2022 +0200
Commit: Michal Prívozník <mprivozn>
CommitDate: Mon May 23 14:44:50 2022 +0200
remote_daemon: Don't run virStateCleanup() if virStateReload() is still running
When a SIGHUP is received a thread is spawned that runs
virStateReload(). However, if SIGINT is received while the former
thread is still running then we may get into problematic
situation: the cleanup code in main() sees drivers initialized
and thus calls virStateCleanup(). So now we have two threads, one
running virStateReload() the other virStateCleanup(). In this
situation it's very likely that a race condition occurs and
either of threads causes SIGSEGV.
To fix this, unmark drivers as initialized in the
virStateReload() thread for the time the function runs.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2075837
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Martin Kletzander <mkletzan>
v8.3.0-210-ga54391fbc2
Reproduce the bug on libvirt-8.3.0-1.el9.x86_64 and pre-verify it on v8.4.0-rc2-13-g8b8fd1bc67 with the steps in comment 0. Test on libvirt-8.4.0-1.el9.x86_64 with the steps in comment 0, the issue is fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8003 |
Description of problem: virtnwfilterd crashed when start->reload->restart virtnwfilterd with running guest having filter setting Version-Release number of selected component (if applicable): libvirt-8.2.0-1.el9.x86_64 How reproducible: 50% Steps to Reproduce: 1.Start a guest with filter setting: #virsh start r9 #virsh dumpxml r9 ... <interface type='bridge'> <mac address='52:54:00:2f:32:97'/> <source bridge='virbr0'/> <target dev='if-0'/> <model type='rtl8139'/> <filterref filter='disallow-arp'/> <alias name='net2'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x03' function='0x0'/> </interface> ... 2.Do start->reload->restart virtnwfilterd: # for i in {1..10}; do systemctl stop virtnwfilterd;sleep 5; systemctl start virtnwfilterd; systemctl reload virtnwfilterd; systemctl restart virtnwfilterd; done 3.Check the coredump: # coredumpctl list TIME PID UID GID SIG COREFILE EXE SIZE Fri 2022-04-15 11:25:37 EDT 2445 0 0 SIGSEGV present /usr/sbin/virtnwfilterd 607.6K Fri 2022-04-15 11:25:55 EDT 3396 0 0 SIGSEGV present /usr/sbin/virtnwfilterd 608.2K Fri 2022-04-15 11:26:02 EDT 3732 0 0 SIGSEGV present /usr/sbin/virtnwfilterd 607.1K Fri 2022-04-15 11:26:15 EDT 4367 0 0 SIGSEGV present /usr/sbin/virtnwfilterd 608.3K Fri 2022-04-15 11:26:21 EDT 4701 0 0 SIGSEGV present /usr/sbin/virtnwfilterd 606.8K 4.Check the backtrace: Core was generated by `/usr/sbin/virtnwfilterd --timeout 120'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fd2f5619a04 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6 [Current thread is 1 (Thread 0x7fd2b67fc640 (LWP 4725))] Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-27.el9_0.x86_64 krb5-libs-1.19.1-15.el9_0.x86_64 openldap-2.4.59-3.el9.x86_64 openssl-libs-3.0.1-18.el9.x86_64 zlib-1.2.11-31.el9.x86_64 (gdb) (gdb) t a a bt Thread 4 (Thread 0x7fd2f41d1f80 (LWP 4701)): #0 0x00007fd2f43c30c4 in __do_global_dtors_aux () from /lib64/libcrypt.so.2 #1 0x00007fd2f5fc201b in _dl_fini () at dl-fini.c:138 #2 0x00007fd2f55ce475 in __run_exit_handlers () from /lib64/libc.so.6 #3 0x00007fd2f55ce5f0 in exit () from /lib64/libc.so.6 #4 0x00007fd2f55b6e57 in __libc_start_call_main () from /lib64/libc.so.6 #5 0x00007fd2f55b6efc in __libc_start_main_impl () from /lib64/libc.so.6 #6 0x00005581ed357f35 in _start () Thread 3 (Thread 0x7fd2b5ffb640 (LWP 4718)): #0 0x00007fd2f56b9edf in poll () from /lib64/libc.so.6 #1 0x00007fd2f5aa359c in g_main_context_poll (priority=<optimized out>, n_fds=1, fds=0x7fd2a0009e10, timeout=<optimized out>, context=0x7fd2a000d360) at ../glib/gmain.c:4434 #2 g_main_context_iterate.constprop.0 (context=context@entry=0x7fd2a000d360, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126 #3 0x00007fd2f5a4c5d3 in g_main_context_iteration (context=0x7fd2a000d360, may_block=may_block@entry=1) at ../glib/gmain.c:4196 #4 0x00007fd2f5a4c621 in glib_worker_main (data=<optimized out>) at ../glib/gmain.c:6089 #5 0x00007fd2f5a79d72 in g_thread_proxy (data=0x7fd2a0009400) at ../glib/gthread.c:826 #6 0x00007fd2f56166ca in start_thread () from /lib64/libc.so.6 #7 0x00007fd2f55b63f0 in clone3 () from /lib64/libc.so.6 Thread 2 (Thread 0x7fd2b57fa640 (LWP 4719)): #0 0x00007fd2f56b9edf in poll () from /lib64/libc.so.6 #1 0x00007fd2f5aa359c in g_main_context_poll (priority=<optimized out>, n_fds=2, fds=0x7fd2a0021b80, timeout=<optimized out>, context=0x7fd2a001e9b0) at ../glib/gmain.c:4434 #2 g_main_context_iterate.constprop.0 (context=0x7fd2a001e9b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126 #3 0x00007fd2f5a4e463 in g_main_loop_run (loop=0x7fd2a001eaa0) at ../glib/gmain.c:4329 #4 0x00007fd2f58da5ca in gdbus_shared_thread_func (user_data=0x7fd2a001e980) at ../gio/gdbusprivate.c:280 #5 0x00007fd2f5a79d72 in g_thread_proxy (data=0x7fd2a0019800) at ../glib/gthread.c:826 #6 0x00007fd2f56166ca in start_thread () from /lib64/libc.so.6 #7 0x00007fd2f55b63f0 in clone3 () from /lib64/libc.so.6 Thread 1 (Thread 0x7fd2b67fc640 (LWP 4725)): --Type <RET> for more, q to quit, c to continue without paging-- #0 0x00007fd2f5619a04 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6 #1 0x00007fd2f5c69089 in virMutexLock (m=<optimized out>) at ../src/util/virthread.c:91 #2 0x00007fd2f5c690ae in virLockGuardLock (m=0x30) at ../src/util/virthread.c:102 #3 0x00007fd2e4490c69 in nwfilterStateReload () at ../src/nwfilter/nwfilter_driver.c:330 #4 0x00007fd2f5e0f040 in virStateReload () at ../src/libvirt.c:759 #5 0x00005581ed3580e3 in daemonReloadHandlerThread (opaque=<optimized out>) at ../src/remote/remote_daemon.c:456 #6 0x00007fd2f5c71bb9 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:256 #7 0x00007fd2f56166ca in start_thread () from /lib64/libc.so.6 #8 0x00007fd2f55b63f0 in clone3 () from /lib64/libc.so.6 Actual results: Expected results: Additional info: