RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2075837 - virtnwfilterd crashed when start->reload->restart virtnwfilterd with running guest having filter setting
Summary: virtnwfilterd crashed when start->reload->restart virtnwfilterd with running ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-15 15:30 UTC by yafu
Modified: 2022-11-15 10:39 UTC (History)
6 users (show)

Fixed In Version: libvirt-8.4.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 10:04:06 UTC
Type: Bug
Target Upstream Version: 8.4.0
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-119067 0 None None None 2022-04-15 15:31:14 UTC
Red Hat Product Errata RHSA-2022:8003 0 None None None 2022-11-15 10:04:26 UTC

Description yafu 2022-04-15 15:30:53 UTC
Description of problem:
virtnwfilterd crashed when start->reload->restart virtnwfilterd with running guest having filter setting

Version-Release number of selected component (if applicable):
libvirt-8.2.0-1.el9.x86_64

How reproducible:
50%

Steps to Reproduce:
1.Start a guest with filter setting:
#virsh start r9

#virsh dumpxml r9
...
 <interface type='bridge'>
      <mac address='52:54:00:2f:32:97'/>
      <source bridge='virbr0'/>
      <target dev='if-0'/>
      <model type='rtl8139'/>
      <filterref filter='disallow-arp'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x03' function='0x0'/>
    </interface>
...


2.Do start->reload->restart virtnwfilterd:
# for i in {1..10}; do systemctl stop virtnwfilterd;sleep 5; systemctl start virtnwfilterd; systemctl reload virtnwfilterd; systemctl restart virtnwfilterd; done

3.Check the coredump:
# coredumpctl list
TIME                         PID UID GID SIG     COREFILE EXE                       SIZE
Fri 2022-04-15 11:25:37 EDT 2445   0   0 SIGSEGV present  /usr/sbin/virtnwfilterd 607.6K
Fri 2022-04-15 11:25:55 EDT 3396   0   0 SIGSEGV present  /usr/sbin/virtnwfilterd 608.2K
Fri 2022-04-15 11:26:02 EDT 3732   0   0 SIGSEGV present  /usr/sbin/virtnwfilterd 607.1K
Fri 2022-04-15 11:26:15 EDT 4367   0   0 SIGSEGV present  /usr/sbin/virtnwfilterd 608.3K
Fri 2022-04-15 11:26:21 EDT 4701   0   0 SIGSEGV present  
/usr/sbin/virtnwfilterd 606.8K

4.Check the backtrace:
Core was generated by `/usr/sbin/virtnwfilterd --timeout 120'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd2f5619a04 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fd2b67fc640 (LWP 4725))]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-27.el9_0.x86_64 krb5-libs-1.19.1-15.el9_0.x86_64 openldap-2.4.59-3.el9.x86_64 openssl-libs-3.0.1-18.el9.x86_64 zlib-1.2.11-31.el9.x86_64
(gdb) 
(gdb) t a a bt

Thread 4 (Thread 0x7fd2f41d1f80 (LWP 4701)):
#0  0x00007fd2f43c30c4 in __do_global_dtors_aux () from /lib64/libcrypt.so.2
#1  0x00007fd2f5fc201b in _dl_fini () at dl-fini.c:138
#2  0x00007fd2f55ce475 in __run_exit_handlers () from /lib64/libc.so.6
#3  0x00007fd2f55ce5f0 in exit () from /lib64/libc.so.6
#4  0x00007fd2f55b6e57 in __libc_start_call_main () from /lib64/libc.so.6
#5  0x00007fd2f55b6efc in __libc_start_main_impl () from /lib64/libc.so.6
#6  0x00005581ed357f35 in _start ()

Thread 3 (Thread 0x7fd2b5ffb640 (LWP 4718)):
#0  0x00007fd2f56b9edf in poll () from /lib64/libc.so.6
#1  0x00007fd2f5aa359c in g_main_context_poll (priority=<optimized out>, n_fds=1, fds=0x7fd2a0009e10, timeout=<optimized out>, context=0x7fd2a000d360) at ../glib/gmain.c:4434
#2  g_main_context_iterate.constprop.0 (context=context@entry=0x7fd2a000d360, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126
#3  0x00007fd2f5a4c5d3 in g_main_context_iteration (context=0x7fd2a000d360, may_block=may_block@entry=1) at ../glib/gmain.c:4196
#4  0x00007fd2f5a4c621 in glib_worker_main (data=<optimized out>) at ../glib/gmain.c:6089
#5  0x00007fd2f5a79d72 in g_thread_proxy (data=0x7fd2a0009400) at ../glib/gthread.c:826
#6  0x00007fd2f56166ca in start_thread () from /lib64/libc.so.6
#7  0x00007fd2f55b63f0 in clone3 () from /lib64/libc.so.6

Thread 2 (Thread 0x7fd2b57fa640 (LWP 4719)):
#0  0x00007fd2f56b9edf in poll () from /lib64/libc.so.6
#1  0x00007fd2f5aa359c in g_main_context_poll (priority=<optimized out>, n_fds=2, fds=0x7fd2a0021b80, timeout=<optimized out>, context=0x7fd2a001e9b0) at ../glib/gmain.c:4434
#2  g_main_context_iterate.constprop.0 (context=0x7fd2a001e9b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126
#3  0x00007fd2f5a4e463 in g_main_loop_run (loop=0x7fd2a001eaa0) at ../glib/gmain.c:4329
#4  0x00007fd2f58da5ca in gdbus_shared_thread_func (user_data=0x7fd2a001e980) at ../gio/gdbusprivate.c:280
#5  0x00007fd2f5a79d72 in g_thread_proxy (data=0x7fd2a0019800) at ../glib/gthread.c:826
#6  0x00007fd2f56166ca in start_thread () from /lib64/libc.so.6
#7  0x00007fd2f55b63f0 in clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7fd2b67fc640 (LWP 4725)):
--Type <RET> for more, q to quit, c to continue without paging--
#0  0x00007fd2f5619a04 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x00007fd2f5c69089 in virMutexLock (m=<optimized out>) at ../src/util/virthread.c:91
#2  0x00007fd2f5c690ae in virLockGuardLock (m=0x30) at ../src/util/virthread.c:102
#3  0x00007fd2e4490c69 in nwfilterStateReload () at ../src/nwfilter/nwfilter_driver.c:330
#4  0x00007fd2f5e0f040 in virStateReload () at ../src/libvirt.c:759
#5  0x00005581ed3580e3 in daemonReloadHandlerThread (opaque=<optimized out>) at ../src/remote/remote_daemon.c:456
#6  0x00007fd2f5c71bb9 in virThreadHelper (data=<optimized out>) at ../src/util/virthread.c:256
#7  0x00007fd2f56166ca in start_thread () from /lib64/libc.so.6
#8  0x00007fd2f55b63f0 in clone3 () from /lib64/libc.so.6


Actual results:


Expected results:


Additional info:

Comment 1 Jaroslav Suchanek 2022-04-21 13:14:59 UTC
Daniel, would you please look into this one? Thanks.

Comment 2 Michal Privoznik 2022-04-22 14:28:24 UTC
Current master has slightly different line numbers, but the offending code is this:

static int
nwfilterStateReload(void)
{
    if (!driver)
        return -1;

    if (!driver->privileged)
        return 0;

    virNWFilterDHCPSnoopEnd(NULL);
    /* shut down all threads -- they will be restarted if necessary */
    virNWFilterLearnThreadsTerminate(true);

    VIR_WITH_MUTEX_LOCK_GUARD(&driverMutex) {
        VIR_WITH_MUTEX_LOCK_GUARD(&driver->updateLock) {                    /* THIS is the line that causes SIGSEGV */
            virNWFilterObjListLoadAllConfigs(driver->nwfilters, driver->configDir);
        }


        virNWFilterBuildAll(driver, false);
    }

    return 0;
}



And I think I know where the problem is. The problematic line is the one that locks driver->updateLock because from the stack trace, driver is NULL at that point (indeed, updateLock member is at offset 30). And the problem is that after driverMutex lock was acquired we have to check again whether driver == NULL. It's not sufficient that this condition was checked at the beginning of the function, because that was done outside of the lock that guards the driver variable.

Comment 3 Michal Privoznik 2022-04-22 14:46:45 UTC
Patch posted onto the list:

https://listman.redhat.com/archives/libvir-list/2022-April/230337.html

Comment 5 Michal Privoznik 2022-05-19 14:50:26 UTC
v3 (although patch is v2 because it's implementing different approach):

https://listman.redhat.com/archives/libvir-list/2022-May/231663.html

Comment 6 Michal Privoznik 2022-05-23 12:47:34 UTC
Merged upstream as:

commit a54391fbc268e19252f5a6b2d91a6f8d84990243
Author:     Michal Prívozník <mprivozn>
AuthorDate: Mon Apr 25 10:31:56 2022 +0200
Commit:     Michal Prívozník <mprivozn>
CommitDate: Mon May 23 14:44:50 2022 +0200

    remote_daemon: Don't run virStateCleanup() if virStateReload() is still running
    
    When a SIGHUP is received a thread is spawned that runs
    virStateReload(). However, if SIGINT is received while the former
    thread is still running then we may get into problematic
    situation: the cleanup code in main() sees drivers initialized
    and thus calls virStateCleanup(). So now we have two threads, one
    running virStateReload() the other virStateCleanup(). In this
    situation it's very likely that a race condition occurs and
    either of threads causes SIGSEGV.
    
    To fix this, unmark drivers as initialized in the
    virStateReload() thread for the time the function runs.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2075837
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Martin Kletzander <mkletzan>

v8.3.0-210-ga54391fbc2

Comment 7 yalzhang@redhat.com 2022-06-01 03:34:23 UTC
Reproduce the bug on libvirt-8.3.0-1.el9.x86_64 and pre-verify it on v8.4.0-rc2-13-g8b8fd1bc67 with the steps in comment 0.

Comment 10 yalzhang@redhat.com 2022-06-13 05:48:16 UTC
Test on libvirt-8.4.0-1.el9.x86_64 with the steps in comment 0, the issue is fixed.

Comment 12 errata-xmlrpc 2022-11-15 10:04:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003


Note You need to log in before you can comment on or make changes to this bug.