Bug 696660

Summary: crash in libvirtd during brDeleteTap
Product: Red Hat Enterprise Linux 6 Reporter: Laine Stump <laine>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: dyuan, eblake, jdenemar, xhu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.8.7-18.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:29:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Laine Stump 2011-04-14 14:47:16 UTC
If there is a problem setting

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff59f4b52 in ?? () from /lib/libc.so.6
(gdb) bt full
#0  0x00007ffff59f4b52 in ?? () from /lib/libc.so.6
No symbol table info available.
#1  0x00007ffff7888a64 in virStrcpy (dest=0x7fffffffd910 "", src=0x5 <Address 0x5 out of bounds>, destbytes=16) at util/util.c:2458
No locals.
#2  0x00007ffff786e188 in brDeleteTap (ctl=<value optimized out>, ifname=0x5 <Address 0x5 out of bounds>) at util/bridge.c:569
        try = {ifr_ifrn = {ifrn_name = '\000' <repeats 15 times>}, ifr_ifru = {ifru_addr = {sa_family = 4098,
              sa_data = '\000' <repeats 13 times>}, ifru_dstaddr = {sa_family = 4098, sa_data = '\000' <repeats 13 times>},
            ifru_broadaddr = {sa_family = 4098, sa_data = '\000' <repeats 13 times>}, ifru_netmask = {sa_family = 4098,
              sa_data = '\000' <repeats 13 times>}, ifru_hwaddr = {sa_family = 4098, sa_data = '\000' <repeats 13 times>},
            ifru_flags = 4098, ifru_ivalue = 4098, ifru_mtu = 4098, ifru_map = {mem_start = 4098, mem_end = 0, base_addr = 0,
              irq = 0 '\000', dma = 0 '\000', port = 0 '\000'}, ifru_slave = "\002\020", '\000' <repeats 13 times>,
            ifru_newname = "\002\020", '\000' <repeats 13 times>, ifru_data = 0x1002 <Address 0x1002 out of bounds>}}
        fd = 16
#3  0x00000000004975a3 in networkStartNetworkDaemon (driver=0x709a50, network=0x70f000) at network/bridge_driver.c:1775
        ii = <value optimized out>
        err = 7477200
        v4present = 16
        v6present = 208
        save_err = 0x711550
        ipdef = <value optimized out>
        macTapIfName = 0x5 <Address 0x5 out of bounds>
        __FUNCTION__ = "networkStartNetworkDaemon"
        __func__ = "networkStartNetworkDaemon"
#4  0x00000000004980e7 in networkAutostartConfigs (driver=0x709a50) at network/bridge_driver.c:240
        i = 1
#5  0x00000000004984d5 in networkStartup (privileged=<value optimized out>) at network/bridge_driver.c:323
        uid = <value optimized out>
        base = 0x0
        err = <value optimized out>
        __FUNCTION__ = "networkStartup"
#6  0x00007ffff78dacb0 in virStateInitialize (privileged=1) at libvirt.c:793
        i = 1
        ret = 0
        __func__ = "virStateInitialize"
#7  0x000000000041f426 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:3383
        server = 0x6eb5e0
---Type <return> to continue, or q <return> to quit---
        pid_file = 0x4a77e0 "/usr/local/var/run/libvirtd.pid"
        remote_config_file = 0x4a7aa0 "/usr/local/etc/libvirt/libvirtd.conf"
        statuswrite = -1
        ret = <value optimized out>
        opts = {{name = 0x4a67f6 "verbose", has_arg = 0, flag = 0x6d7e00, val = 1}, {name = 0x4a67fe "daemon", has_arg = 0,
            flag = 0x6d7e04, val = 1}, {name = 0x4b34da "listen", has_arg = 0, flag = 0x6d7e08, val = 1}, {name = 0x4c0477 "config",
            has_arg = 1, flag = 0x0, val = 102}, {name = 0x4ba900 "timeout", has_arg = 1, flag = 0x0, val = 116}, {
            name = 0x4c059f "pid-file", has_arg = 1, flag = 0x0, val = 112}, {name = 0x4ae16d "version", has_arg = 0, flag = 0x0,
            val = 129}, {name = 0x4ae34e "help", has_arg = 0, flag = 0x0, val = 63}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
        __func__ = "main"

Comment 1 Laine Stump 2011-04-14 15:01:00 UTC
Initial report here: 

   https://www.redhat.com/archives/libvir-list/2011-April/msg00712.html

If there is a problem when setting the forward-delay or stp-enable of the bridge used by a libvirt virtual network, during cleanup libvirt must delete the newly created "dummy tap" device it has just created. Unfortunately, the string containing the (programmatically-generated) name of this tap interface was already freed, so the attempt to delete caused a segfault.

The solution is, of course, to not free the string until we are certain that the network startup will be successful.

Comment 3 Laine Stump 2011-04-14 15:17:35 UTC
(note that this same crash would occur if there were any problem further down in the startup, eg dnsmasq or radvd failed to start, or a failure when adding iptables rules.)

Comment 7 Laine Stump 2011-04-25 20:36:31 UTC
To reproduce the crash (with the unfixed libvirt) you could temporarily rename /usr/sbin/dnsmasq then attempt to start a virtual network. I *think* that will cause the crash (but don't have the proper unpatched libvirt to try it myself).

Comment 10 Laine Stump 2011-04-29 12:36:48 UTC
I can't explain why forcing the error on dnsmasq execution didn't trigger the crash. According to your log, renaming dnsmasq did create the error condition, and according to the code that error condition would cause an attempt to use macTapIfName after it has been freed and NULLed.

I suppose you could instead try doing exactly what the original reporter did - he accidentally started up a system instance of dnsmasq that was listening on all interfaces, so that when the dnsmasq run by libvirtd started, it failed due to "address already in use". If that doesn't crash with the new code, then I'd say the problem is fixed.

Comment 13 Michal Privoznik 2011-05-04 08:05:13 UTC
* The problem was we free()'d the variable but in some cases wanted to access it later. More detailed: if the first part of the network setting process succeed we free'd the interface name, but any subsequent part failure tried to print error message refering to the interface name.

* This led to libvirtd recieving SIGSEGV and thus crash.

* We changed the place where we free the variable.

* Users now experience propper error message istead of crash.

Comment 14 errata-xmlrpc 2011-05-19 13:29:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html