Bug 819416

Summary: libvirtd looses network configuration, bridges remain
Product: Red Hat Enterprise Linux 6 Reporter: Ilkka Tengvall <ikke>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: acathrow, cwei, dallan, danken, dyuan, laine, mzhan, pcfe, shyu, ydu
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 947385 (view as bug list) Environment:
Last Closed: 2014-04-04 21:01:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 494837, 782183    

Description Ilkka Tengvall 2012-05-07 07:32:55 UTC
Description of problem:

I create three host-only networks for setting up a cluster inside KVM. I install the guests using virt-install. After the guest is taken down, the network disappears from virsh net-list --all. But the xml file remains in /var/lib/libvirtd/networks/ and the bridges and virtual nics are in place. But since network is not visible for libvirtd, it refuses to re-start the guest due a missing network.


Version-Release number of selected component (if applicable):

python-virtinst-0.600.0-5.el6.noarch
libvirt-python-0.9.4-23.el6.x86_64
libvirt-client-0.9.4-23.el6.x86_64
libvirt-0.9.4-23.el6.x86_64
redhat-release-server-6Server-6.2.0.3.el6.x86_64


How reproducible:

Enough for not to get things working at all.

Steps to Reproduce:
1. I created networks (in fedora I also have ip range there, rhel won't handle it):

---------------------
cat >/tmp/fp_net.xml<<EOF
<network>
  <name>fp_commission</name>
</network>
EOF
sudo virsh net-create /tmp/fp_net.xml

cat >/tmp/fp_net_1.xml<<EOF
<network>
  <name>fp_internal_1</name>
</network>
EOF
sudo virsh net-create /tmp/fp_net_1.xml

cat >/tmp/fp_net_2.xml<<EOF
<network>
  <name>fp_internal_2</name>
</network>
EOF
sudo virsh net-create /tmp/fp_net_2.xml
---------------------

2. I created the guests

---------------------
sudo lvcreate -n LV_KVM_fews -L 10G VG_Whipper
sudo virt-install --hvm --network network=default --network network=fp_commission --ram=512 --name=fp-fews --os-type=linux --os-variant=rhel5 --cdrom=/var/lib/libvirt/images/ipxe.iso --disk=/dev/mapper/VG_Whipper-LV_KVM_fews
---------------------

This normally even fails, no need to create the second guest:

---------------------
sudo lvcreate -n LV_KVM_CLA-0 -L 25G VG_Whipper
sudo virt-install --hvm --network network=fp_internal_1,mac=00:50:56:1c:ce:01,model=e1000 \
 --network network=fp_internal_2,mac=00:50:56:1c:ce:11,model=e1000 \
 --network network=fp_commission,model=e1000  --ram=512 --name=fp-cla0 \
 --os-type=linux --disk=/dev/mapper/VG_Whipper-LV_KVM_CLA--0,bus=ide --pxe
---------------------



3.


  
Actual results: At some point network disappears even though the bridge and the configs are there:


---------------------
$ sudo virsh net-dumpxml fp_internal_1
error: failed to get network 'fp_internal_1'
error: Network not found: no network with matching name 'fp_internal_1'

$ sudo virsh net-start fp_internal_1
error: failed to get network 'fp_internal_1'
error: Network not found: no network with matching name 'fp_internal_1'

$ sudo ls /var/lib/libvirt/network
default.xml  fp_commission.xml	fp_internal_1.xml  fp_internal_2.xml

$ sudo cat /var/lib/libvirt/network/fp_internal_1.xml
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE 
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh net-edit fp_internal_1
or other application using the libvirt API.
-->

<network>
  <name>fp_internal_1</name>
  <uuid>cc6bfa67-4233-4968-c767-586beb1f4528</uuid>
  <bridge name='virbr1' stp='on' delay='0' />
  <mac address='52:54:00:DB:6D:6C'/>
</network>

$ sudo brctl show
bridge name     bridge id               STP enabled     interfaces
heppa           8000.525400775671       yes             heppa-nic
virbr0          8000.525400d3c89a       yes             virbr0-nic
virbr1          8000.525400db6d6c       yes             virbr1-nic
virbr2          8000.525400e30db6       yes             virbr2-nic
---------------------


Expected results:

The network should be and stay up and guests start fine.

Additional info:

I can recover from the state by taking the interface down, deleting the virbrX bridges, and recreating the networks, but it will loose them again at some point (I'll try to figure out exactly at which point...):


---------------------
$ sudo brctl delif virbr0 virbr0-nic
$ sudo brctl delif virbr1 virbr1-nic
$ sudo brctl delif virbr2 virbr2-nic
$ sudo ip link set down dev virbr0 
$ sudo ip link set down dev virbr1 
$ sudo ip link set down dev virbr2 
$ sudo brctl delbr virbr0
$ sudo brctl delbr virbr1
$ sudo brctl delbr virbr2
$ sudo rm \
/var/lib/libvirt/network/{fp_commission.xml,fp_internal_1.xml,fp_internal_2.xml}
$  sudo virsh net-create /tmp/fp_net.xml
Network fp_commission created from /tmp/fp_net.xml
$ sudo virsh net-create /tmp/fp_net_1.xml
Network fp_internal_1 created from /tmp/fp_net_1.xml
$ sudo virsh net-create /tmp/fp_net_2.xml
Network fp_internal_2 created from /tmp/fp_net_2.xml

$ sudo virsh net-list
Name                 State      Autostart
-----------------------------------------
default              active     yes       
fp_commission        active     no        
fp_internal_1        active     no        
fp_internal_2        active     no        
---------------------

Comment 1 Ilkka Tengvall 2012-05-07 07:35:21 UTC
I forgot to mention, the same happens on up to date F16.

Comment 3 Laine Stump 2012-05-07 09:50:41 UTC
The problem here is that the networks are transient (virsh net-create was used, rather than net-define), and the bridge driver doesn't properly account for active transient networks when it is being restarted - it only reads the persistent networks from /etc/libvirt. This is different behavior from, e.g., the qemu driver, which reads in all active domains from /var/lib/libvirt, then determines which are *really* active by attempting to connect to their monitors. The restart code in the bridge drive should do something similar - read the active networks from /var/lib/libvirt, then so some sort of sanity check to see which are really still there (at the very least checking for the bridge device and dummy tap device, and verifying that the process(es) with pidfiles in /var/run/libvirt/network are running). This still isn't perfect, since some networks may not have an associated dnsmasq or radvd process, but it's better than just dropping all the transient networks, or assuming they're all still active.

This problem has been around for a very long time, btw. I guess nobody has seriously used transient networks before.

Comment 5 Laine Stump 2012-05-16 15:59:27 UTC
Illka - my suspicion is that you don't actually want a transient network (since it would cease to exist the first time you rebooted the host), but instead want a persistent (i.e. permanent) network - is that correct?

(I suspect this because you are defining persistent guests with virt-install, and it would not make sense to have a persistent guest definition referencing a transient network; this could very easily lead to a situation where the network referenced by the guest didn't exist at the time the guest was started (even once the bug uncovered in this BZ is fixed).

There may be a situation where a transient network is the correct tool for the job, but I don't think this is one of those situations.

Comment 6 Ilkka Tengvall 2012-05-18 07:35:05 UTC
You are right in this case. That's what I want for now. But the transient networks were aimed for another goal I'm doing this for. We'll setup a network of hosts that we run tests on using KVM guests as test targets. So the hosts will not be rebooted frequently, where as guests come and go all the time. For that I was planning to use just virsh commands to create the nets and the guests. And if guests would remain over boot, it would be just a bug in the env.

But like said, it's no biggie. Now that I know there is such a bug in libvirt, I can as well go around it using the persistent configurations, and do some clean-up routines for guests after rebooting the host, in case of accidental reboot or host crash.

Comment 15 Laine Stump 2013-04-30 18:15:35 UTC
I'm pretty certain this is fixed by the following commit (and *many* others in the past few months; this is only the last piece of the puzzle):

commit 446dd66b7c7641974d25423c4cb5c48e481a84e4
Author: Peter Krempa <pkrempa>
Date:   Tue Apr 16 18:35:59 2013 +0200

    network: bridge_driver: don't lose transient networks on daemon restart
    
    Until now tranisent networks weren't really useful as libvirtd wasn't
    able to remember them across restarts. This patch adds support for
    loading status files of transient networks (that already were generated)
    so that the status isn't lost.
    
    This patch chops up virNetworkObjUpdateParseFile and turns it into
    virNetworkLoadState and a few friends that will help us to load status
    XMLs and refactors the functions that are loading the configs to use
    them.

Comment 16 Laine Stump 2013-05-07 16:10:59 UTC
To anyone attempting to backport the patches that fix this bug - the above patch caused a regression in behavior of qemu:///session that was fixed by the following commit (post-1.0.5):


Author: Laine Stump <laine>
Date:   Thu May 2 13:59:52 2013 -0400

    network: fix network driver startup for qemu:///session
    
    This should resolve https://bugzilla.redhat.com/show_bug.cgi?id=958907

Comment 17 Laine Stump 2013-05-07 16:14:50 UTC
Oops. Missed the commit ID in the paste:

commit 2ffd87d8204c209b81610b56ee5161ae71b58b8c
Author: Laine Stump <laine>
Date:   Thu May 2 13:59:52 2013 -0400

    network: fix network driver startup for qemu:///session

Comment 22 RHEL Program Management 2014-04-04 21:01:08 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.