Bug 738778

Summary: libvirtd crash during restart if running guest has <filterref>
Product: Red Hat Enterprise Linux 6 Reporter: Laine Stump <laine>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: acathrow, dallan, dyuan, mzhan, rwu, stefanb, whuang, xhu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.4-12.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:31:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743047    
Attachments:
Description Flags
domain xml of the domain containing the filter reference that induces the crash. none

Description Laine Stump 2011-09-15 18:48:07 UTC
libvirtd-0.9.4-11

If libvirtd is restarted while there is a guest already running that has a <filterref> in its <interface> definition, it will get a segfault due to the nwfilter driver->nwfilters pointer being uninitialized.

How to reproduce:

1) add  "<filterref filter='clean-traffic'/>" to the <interface> section of a guest.

2) start the guest

3) from a root shell prompt on the host, run "/etc/init.d/libvirtd restart"

After the current libvirtd is stopped, the new libvirtd should crash during initialization.

How reproducible: 100% for me.

Here is an exemplary backtrace:

#0  virNWFilterObjFindByName (nwfilters=0x28, 
    name=0x7f7bec130190 "disallow-dhcp") at conf/nwfilter_conf.c:2169
#1  0x00000000004d1138 in __virNWFilterInstantiateFilter (conn=0x7f7bec1309b0, 
    teardownOld=true, ifname=0x7f7bec1301d0 "vnet0", ifindex=73, linkdev=0x0, 
    nettype=VIR_DOMAIN_NET_TYPE_NETWORK, macaddr=0x7f7bec130804 "RT", 
    filtername=0x7f7bec130190 "disallow-dhcp", filterparams=0x7f7bec1308b0, 
    useNewFilter=INSTANTIATE_ALWAYS, driver=0x0, forceWithPendingReq=false, 
    foundNewFilter=0x7f7bf0b36b4f) at nwfilter/nwfilter_gentech_driver.c:795
#2  0x00000000004d1a53 in _virNWFilterInstantiateFilter (conn=0x7f7bec1309b0, 
    net=0x7f7bec130800, teardownOld=true, useNewFilter=INSTANTIATE_ALWAYS, 
    foundNewFilter=0x7f7bf0b36b4f) at nwfilter/nwfilter_gentech_driver.c:913
#3  0x00000000004d1c2a in virNWFilterInstantiateFilter (
    conn=<value optimized out>, net=<value optimized out>)
    at nwfilter/nwfilter_gentech_driver.c:984
#4  0x0000000000484708 in qemuProcessFiltersInstantiate (
    opaque=<value optimized out>) at qemu/qemu_process.c:2258
#5  qemuProcessReconnect (opaque=<value optimized out>)
    at qemu/qemu_process.c:2578
#6  0x000000357c457512 in virThreadHelper (data=<value optimized out>)
    at util/threads-pthread.c:157
#7  0x00000035640077e1 in ?? ()
#8  0x00007f7bf0b37700 in ?? ()

Comment 2 Stefan Berger 2011-09-16 00:26:44 UTC
I have tried with libvirt 0.9.4 and don't see this happening at all. It looks like the nwfilters pointer is corrupted. Can you post the XML of your VM? Can you post the XML of the 'disallow-dhcp' filter, which I don't have on my system, and the filter referencing it.

  Stefan

Comment 3 Laine Stump 2011-09-16 01:20:39 UTC
Created attachment 523472 [details]
domain xml of the domain containing the filter reference that induces the crash.

I changed the domain xml to use the standard included "clean-traffic" filter, and the problem persists, so I'm sending just the domain xml (since the filter is part of the libvirt rpm).

Note that if the domain is not running when libvirtd starts, libvirtd *doesn't* crash if I then start the domain. So the pointer is only "improper" (whether it's corrupt or uninitialized) during virDomainLoadAllConfigs() - later on it is again back to normal.

Comment 4 Laine Stump 2011-09-16 14:18:19 UTC
Stefan found the problem and committed a fix upstream:

commit 3f2cb3ab595b3c185f6f814a5e2f46f4866b45a9
Author: Stefan Berger <stefanb.com>
Date:   Fri Sep 16 09:44:43 2011 -0400

    Fix buzzilla 738778
    
    This patch fixes the bug shown in bugzilla 738778. It's not an nwfilter
    problem but a connection sharing / closure issue.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=738778
    
    Depending on the speed / #CPUs of the machine you are using you may not
    see this bug all the time.

A more detailed explanation: qemuProcessReconnectAll opens a connection and starts several threads which may use the conn data, but then closes the conn without waiting for the threads to complete. The solution is to add an extra conn open before starting each thread, then have the threads close the conn when they are finished.

a rebased patch has been sent to rhvirt-patches for inclusion in RHEL6.

http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00524.html

Comment 7 errata-xmlrpc 2011-12-06 11:31:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html