Bug 1730084 - systemctl restart network disables access to running VMs [NEEDINFO]
Summary: systemctl restart network disables access to running VMs
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.1
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: 8.0
Assignee: Virtualization Maintenance
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1897025
TreeView+ depends on / blocked
 
Reported: 2019-07-15 19:16 UTC by Venkatesh Kavtikwar
Modified: 2021-01-26 03:48 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
achareka: needinfo? (jsuchane)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5579291 0 None None None 2020-11-17 10:41:49 UTC

Description Venkatesh Kavtikwar 2019-07-15 19:16:49 UTC
Description of problem:

systemctl restart network disables access to running VM's. The link between bridge and vnet interface is breaking due to a network restart & is causing network issue.


Version-Release number of selected component (if applicable):

initscripts-9.49.46-1.el7.x86_64
kernel-3.10.0-957.10.1.el7.x86_64


How reproducible:

- Start any VM on KVM host & access it over network
- Restart the network service
- Try to access the VM


Steps to Reproduce:

1.Start a VM on KVM host, it will create a vnet interface for the network device and link to the respective bridge on which it is created.

# ip a
4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP group default qlen 1000
    link/ether 52:54:00:21:43:f6 brd ff:ff:ff:ff:ff:ff
6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:7c:56:1d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe7c:561d/64 scope link 
       valid_lft forever preferred_lft forever
7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:21:43:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.190/24 brd 192.168.122.255 scope global noprefixroute dynamic br0

# brctl show
bridge name	bridge id		STP enabled	interfaces
br0		8000.5254002143f6	no		ens9
							vnet0
2. Restart network service.

# systemctl restart network

3. You will see vnet interface is not added back to bridge.

# brctl show
bridge name	bridge id		STP enabled	interfaces
br0		8000.5254002143f6	no		ens9

6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether fe:54:00:7c:56:1d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe7c:561d/64 scope link 
 


Expected results:

Upon network restart "vnet" interfaces for the running VM's should be added back  to bridge to avoid network connectivity issue.

The workaround is to add the vnet interface back to bridge using "brctl addif <bridge> <vnet>". 



Additional info:

During the analysis we observed that, "libvirtd" creates these "vnet" interfaces and link it to bridge when the VM gets initiated. As these vnet interfaces does not have any physical existance, network service is not aware about it and so does not link it back to bridge when the network service is restarted. Please correct if this is not right. 

So we wanted to know whether this is really a bug or RFE or is there any active work going on for this problem?

Comment 2 Venkatesh Kavtikwar 2019-07-15 19:19:12 UTC
Restarting "libvirtd" service does not resolve the vnet/bridge connectivity issue.

Comment 3 Jaroslav Suchanek 2019-08-01 12:29:46 UTC
Laine,

any insight, what can be done about this from libvirt perspective?

Thanks.

Comment 4 Lukáš Nykrýn 2019-08-01 13:09:30 UTC
Who creates that bridge? Does is have regular ifcfg file?

Comment 5 Venkatesh Kavtikwar 2019-08-01 13:16:42 UTC
Yes, the bridge is having "ifcfg" file and configured as a part of their network setup.

Comment 6 Lukáš Nykrýn 2019-08-01 13:37:05 UTC
This is kinda hard to solve, either libvirt needs to react on such change and add the devices to the bridge, or the customer can workaround this by adding those brctl commands to ifup-local

Comment 7 Laine Stump 2019-08-02 20:41:42 UTC
(In reply to Venkatesh Kavtikwar from comment #2)
> Restarting "libvirtd" service does not resolve the vnet/bridge connectivity
> issue.


All the way back in libvirt-3.2.0 (April 2017), I added commit 85bcc022 to libvirt, which caused guests to be reconnected to their configured bridge, but unfortunately was only thinking about the bridges created by libvirt's virtual networks. So if you restart libvirtd, it will check all the guest network connections that are configured as <interface type='network'> and reconnect them properly.

This behavior was fixed *properly* by Dan Berrange in libvirt-5.3.0 commit de938b92, which moved the check for proper connection of tap devices out of libvirt's virtual network driver and into the qemu driver. Unfortunately for RHEL7, this was done as a part of a fairly major refactoring of the network driver, so it won't be possible to simply backport that patch (or even just a few patches). Instead, fixing it in RHEL7 will require making a downstream-only patch (or rebasing, but I don't think we'll be doing that for RHEL7 anymore).

Alternately, as a workaround that didn't require any changes to libvirt code, you could just define a libvirt virtual network that uses your existing bridge, e.g.:

  <network>
    <name>br0-net</name>
    <bridge name='br0'/>
    <forward mode='bridge'/>
  </network>

and then configure your guests to use that network, e.g. instead of the guest config containing:

   <interface type='bridge'>
     <source bridge='br0'/>
     ...


it would have:

   <interface type='network'>
     <source network='br0-net'/>
     ...


Of course whether you do this now, or wait until there is a patch to libvirt equivalent to Dan's de938b92, you will *still* need to restart libvirtd after restarting the network service.

(I'm curious - has the network service always tore down and recreated configured bridges when it's restarted? If so, I'm surprised that we've never encountnered (or even heard of) this problem before...)

Comment 8 Laine Stump 2019-08-02 20:43:58 UTC
Sorry, I noticed after I hit save that I had left out a crucial part of the explanation - the reconnecting of all guest tap devices to their configured bridges happens only when libvirtd is restarted. (I guess that becomes obvious in the 2nd to last paragraph, but I hadn't explicitly stated it).

Comment 12 yalzhang@redhat.com 2020-03-31 06:39:27 UTC
On rhel8 system, the network.service is deprecated. So I tried to use NetworkManager.service instead to reproduce the issue, but it can not be reproduced. 

Test on:
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64
libvirt-libs-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64
kernel-4.18.0-187.el8.x86_64

And also
libvirt-libs-4.5.0-41.module+el8.2.0+5928+db9eea38.x86_64
qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64

Steps:
Scenario 1:
1. Create linux bridge 'br0' with nmcli, and start vm with: 
<interface type='bridge'/> 
<source bridge='br0'/>

2. On the host, run "systemctl restart NetworkManager", after then, check the network function on vm, it works well.

Scenario 2:
1. Create a network as:
# virsh net-dumpxml host-bridge --inactive
<network>
  <name>host-bridge</name>
  <forward mode='bridge'/>
  <bridge name='br0'/>
</network>

2. Start a vm with "<interface type='network'/> <source network='host-bridge'>";

3. On the host, run "systemctl restart NetworkManager", after then, check the network function on vm, it works well.

Comment 24 Marina Kalinin 2020-11-11 03:17:38 UTC
Ameya, can you please take care of a KCS for this + note that there is a manual workaround suggested in the description.

Thank you!

Comment 26 Laine Stump 2020-12-07 18:34:44 UTC
I noticed a few weeks ago that libvirt does still have a bug with re-attaching tap devices when the interface is <interface type='bridge'>. I  *thought* that I had written and sent a patch for this upstream, but it appears it got lost. I've put that back on my to-do list.

Beyond this bugfix (which would make a manual restart of libvirtd.service properly reattach *all* tap devices rather than just most of them), a more generalized automatic reattach cannot realistically be added to libvirt any time soon. A NetworkManager dispatcher script would be a nice automatic workaround in the meantime, but of course if the customer isn't using NetworkManager, then they will need to do add something onto the network.service (or whatever networking service they're using); this could be anything from a simplistic restart of libvirtd.service (assuming the bugfix mentioned in the previous paragraph) to a script that searches for all tap devices in all guests, and individually7 reattaches them each to the appropriate bridge.

Comment 27 Laine Stump 2021-01-26 03:48:07 UTC
NB: the bugfix I referenced in Comment 26 is in libvirt-7.0.0 upstream:

commit dad50cf855d26f35c44ab0eceb6436c6d6a17e06
Author: Laine Stump <laine@redhat.com>
Date:   Tue Oct 20 12:35:09 2020 -0400

    conf: make virDomainNetNotifyActualDevice() callable for all interface types

commit c2b2cdf74659bf13970b8dc8a2db1dd888ac7822
Author: Laine Stump <laine@redhat.com>
Date:   Fri Jan 8 00:36:31 2021 -0500

    call virDomainNetNotifyActualDevice() for all interface types
    
Again - it doesn't make the reconnect automatic, but it does provide a simple way to make it happen - just run "systemctl restart libvirtd.service"


Note You need to log in before you can comment on or make changes to this bug.