1730084 – systemctl restart network disables access to running VMs

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1730084 - systemctl restart network disables access to running VMs

Summary: systemctl restart network disables access to running VMs

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Laine Stump
QA Contact:	yalzhang@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1897025
TreeView+	depends on / blocked

Reported:	2019-07-15 19:16 UTC by Venkatesh Kavtikwar
Modified:	2024-12-20 18:52 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-04-21 22:30:30 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5579291	0	None	None	None	2020-11-17 10:41:49 UTC

Description Venkatesh Kavtikwar 2019-07-15 19:16:49 UTC

Description of problem:

systemctl restart network disables access to running VM's. The link between bridge and vnet interface is breaking due to a network restart & is causing network issue.


Version-Release number of selected component (if applicable):

initscripts-9.49.46-1.el7.x86_64
kernel-3.10.0-957.10.1.el7.x86_64


How reproducible:

- Start any VM on KVM host & access it over network
- Restart the network service
- Try to access the VM


Steps to Reproduce:

1.Start a VM on KVM host, it will create a vnet interface for the network device and link to the respective bridge on which it is created.

# ip a
4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP group default qlen 1000
    link/ether 52:54:00:21:43:f6 brd ff:ff:ff:ff:ff:ff
6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:7c:56:1d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe7c:561d/64 scope link 
       valid_lft forever preferred_lft forever
7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:21:43:f6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.190/24 brd 192.168.122.255 scope global noprefixroute dynamic br0

# brctl show
bridge name	bridge id		STP enabled	interfaces
br0		8000.5254002143f6	no		ens9
							vnet0
2. Restart network service.

# systemctl restart network

3. You will see vnet interface is not added back to bridge.

# brctl show
bridge name	bridge id		STP enabled	interfaces
br0		8000.5254002143f6	no		ens9

6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether fe:54:00:7c:56:1d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe7c:561d/64 scope link 
 


Expected results:

Upon network restart "vnet" interfaces for the running VM's should be added back  to bridge to avoid network connectivity issue.

The workaround is to add the vnet interface back to bridge using "brctl addif <bridge> <vnet>". 



Additional info:

During the analysis we observed that, "libvirtd" creates these "vnet" interfaces and link it to bridge when the VM gets initiated. As these vnet interfaces does not have any physical existance, network service is not aware about it and so does not link it back to bridge when the network service is restarted. Please correct if this is not right. 

So we wanted to know whether this is really a bug or RFE or is there any active work going on for this problem?

Comment 2 Venkatesh Kavtikwar 2019-07-15 19:19:12 UTC

Restarting "libvirtd" service does not resolve the vnet/bridge connectivity issue.

Comment 3 Jaroslav Suchanek 2019-08-01 12:29:46 UTC

Laine,

any insight, what can be done about this from libvirt perspective?

Thanks.

Comment 4 Lukáš Nykrýn 2019-08-01 13:09:30 UTC

Who creates that bridge? Does is have regular ifcfg file?

Comment 5 Venkatesh Kavtikwar 2019-08-01 13:16:42 UTC

Yes, the bridge is having "ifcfg" file and configured as a part of their network setup.

Comment 6 Lukáš Nykrýn 2019-08-01 13:37:05 UTC

This is kinda hard to solve, either libvirt needs to react on such change and add the devices to the bridge, or the customer can workaround this by adding those brctl commands to ifup-local

Comment 7 Laine Stump 2019-08-02 20:41:42 UTC

(In reply to Venkatesh Kavtikwar from comment #2)
> Restarting "libvirtd" service does not resolve the vnet/bridge connectivity
> issue.


All the way back in libvirt-3.2.0 (April 2017), I added commit 85bcc022 to libvirt, which caused guests to be reconnected to their configured bridge, but unfortunately was only thinking about the bridges created by libvirt's virtual networks. So if you restart libvirtd, it will check all the guest network connections that are configured as <interface type='network'> and reconnect them properly.

This behavior was fixed *properly* by Dan Berrange in libvirt-5.3.0 commit de938b92, which moved the check for proper connection of tap devices out of libvirt's virtual network driver and into the qemu driver. Unfortunately for RHEL7, this was done as a part of a fairly major refactoring of the network driver, so it won't be possible to simply backport that patch (or even just a few patches). Instead, fixing it in RHEL7 will require making a downstream-only patch (or rebasing, but I don't think we'll be doing that for RHEL7 anymore).

Alternately, as a workaround that didn't require any changes to libvirt code, you could just define a libvirt virtual network that uses your existing bridge, e.g.:

  <network>
    <name>br0-net</name>
    <bridge name='br0'/>
    <forward mode='bridge'/>
  </network>

and then configure your guests to use that network, e.g. instead of the guest config containing:

   <interface type='bridge'>
     <source bridge='br0'/>
     ...


it would have:

   <interface type='network'>
     <source network='br0-net'/>
     ...


Of course whether you do this now, or wait until there is a patch to libvirt equivalent to Dan's de938b92, you will *still* need to restart libvirtd after restarting the network service.

(I'm curious - has the network service always tore down and recreated configured bridges when it's restarted? If so, I'm surprised that we've never encountnered (or even heard of) this problem before...)

Comment 8 Laine Stump 2019-08-02 20:43:58 UTC

Sorry, I noticed after I hit save that I had left out a crucial part of the explanation - the reconnecting of all guest tap devices to their configured bridges happens only when libvirtd is restarted. (I guess that becomes obvious in the 2nd to last paragraph, but I hadn't explicitly stated it).

Comment 12 yalzhang@redhat.com 2020-03-31 06:39:27 UTC

On rhel8 system, the network.service is deprecated. So I tried to use NetworkManager.service instead to reproduce the issue, but it can not be reproduced. 

Test on:
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64
libvirt-libs-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64
kernel-4.18.0-187.el8.x86_64

And also
libvirt-libs-4.5.0-41.module+el8.2.0+5928+db9eea38.x86_64
qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64

Steps:
Scenario 1:
1. Create linux bridge 'br0' with nmcli, and start vm with: 
<interface type='bridge'/> 
<source bridge='br0'/>

2. On the host, run "systemctl restart NetworkManager", after then, check the network function on vm, it works well.

Scenario 2:
1. Create a network as:
# virsh net-dumpxml host-bridge --inactive
<network>
  <name>host-bridge</name>
  <forward mode='bridge'/>
  <bridge name='br0'/>
</network>

2. Start a vm with "<interface type='network'/> <source network='host-bridge'>";

3. On the host, run "systemctl restart NetworkManager", after then, check the network function on vm, it works well.

Comment 24 Marina Kalinin 2020-11-11 03:17:38 UTC

Ameya, can you please take care of a KCS for this + note that there is a manual workaround suggested in the description.

Thank you!

Comment 26 Laine Stump 2020-12-07 18:34:44 UTC

I noticed a few weeks ago that libvirt does still have a bug with re-attaching tap devices when the interface is <interface type='bridge'>. I  *thought* that I had written and sent a patch for this upstream, but it appears it got lost. I've put that back on my to-do list.

Beyond this bugfix (which would make a manual restart of libvirtd.service properly reattach *all* tap devices rather than just most of them), a more generalized automatic reattach cannot realistically be added to libvirt any time soon. A NetworkManager dispatcher script would be a nice automatic workaround in the meantime, but of course if the customer isn't using NetworkManager, then they will need to do add something onto the network.service (or whatever networking service they're using); this could be anything from a simplistic restart of libvirtd.service (assuming the bugfix mentioned in the previous paragraph) to a script that searches for all tap devices in all guests, and individually7 reattaches them each to the appropriate bridge.

Comment 27 Laine Stump 2021-01-26 03:48:07 UTC

NB: the bugfix I referenced in Comment 26 is in libvirt-7.0.0 upstream:

commit dad50cf855d26f35c44ab0eceb6436c6d6a17e06
Author: Laine Stump <laine>
Date:   Tue Oct 20 12:35:09 2020 -0400

    conf: make virDomainNetNotifyActualDevice() callable for all interface types

commit c2b2cdf74659bf13970b8dc8a2db1dd888ac7822
Author: Laine Stump <laine>
Date:   Fri Jan 8 00:36:31 2021 -0500

    call virDomainNetNotifyActualDevice() for all interface types
    
Again - it doesn't make the reconnect automatic, but it does provide a simple way to make it happen - just run "systemctl restart libvirtd.service"

Comment 36 yalzhang@redhat.com 2021-08-12 06:47:19 UTC

Test on latest rhel8.5 with libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b.x86_64, the result is as expected.

1. Start vm with 2 interfaces, 1 connected to the bridge directly, another connected to the network:
  <network>
  <name>br0-net</name>
  <forward mode='bridge'/>
  <bridge name='br0'/>
</network>

2. check the tap devices are connected to the bridge:
# bridge link | grep master 
7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 
7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 hwmode VEB 
125: vnet24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 
126: vnet25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 

3. set the br0 down and up and check no tap devices are re-connected:
# nmcli con down br0
Connection 'br0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/68)

# nmcli con up br0
Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/79)

# bridge link | grep master 
7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 
7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 hwmode VEB

4. restart libvirtd and check the tap devices are re-connected
# systemctl restart libvirtd 
# bridge link | grep master 
7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 
7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 hwmode VEB 
125: vnet24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 
126: vnet25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 

Test delete and create the bridge "br0", the result is also as expected.

Comment 38 John Ferlan 2021-09-09 18:33:07 UTC

Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 40 Laine Stump 2021-12-09 16:31:51 UTC

There is no work being done in engineering on this issue. I had thought that the knowledgebase article was enough to consider it fixed (especially since all RHEL8 libvirts are now rebased to something beyond RHEL7.0.0.

Basically the solution is that you should restart libvirtd.service any time you restart network.service. This is described in the KB article created by achareka (see Comment 32). (NB: my caveat in Comment 37 about users on RHEL-AV-8.3.1 using libvirt-6.6.0 should no longer be important, since nobody should be using that release any longer).

With that KB article published, I think we can/should close this BZ, as Ademar suggested in Comment 28

Note You need to log in before you can comment on or make changes to this bug.

bcholler
dyuan
fjin
gveitmic
imomin
initscripts-maint-list
jamacku
jcoscia
jen
jmaxwell
jsuchane
kshukla
kwalker
laine
lmen
lnykryn
mkalinin
ptalbert
shipatil
virt-maint
xuzhang
yalzhang