Bug 1730084
Summary: | systemctl restart network disables access to running VMs | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Venkatesh Kavtikwar <vkavtikw> |
Component: | libvirt | Assignee: | Laine Stump <laine> |
libvirt sub component: | Networking | QA Contact: | yalzhang <yalzhang> |
Status: | CLOSED DEFERRED | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | bcholler, dyuan, fjin, gveitmic, imomin, initscripts-maint-list, jamacku, jcoscia, jen, jmaxwell, jsuchane, kshukla, kwalker, laine, lmen, lnykryn, mkalinin, ptalbert, shipatil, virt-maint, xuzhang, yalzhang |
Version: | unspecified | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-21 22:30:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1897025 |
Description
Venkatesh Kavtikwar
2019-07-15 19:16:49 UTC
Restarting "libvirtd" service does not resolve the vnet/bridge connectivity issue. Laine, any insight, what can be done about this from libvirt perspective? Thanks. Who creates that bridge? Does is have regular ifcfg file? Yes, the bridge is having "ifcfg" file and configured as a part of their network setup. This is kinda hard to solve, either libvirt needs to react on such change and add the devices to the bridge, or the customer can workaround this by adding those brctl commands to ifup-local (In reply to Venkatesh Kavtikwar from comment #2) > Restarting "libvirtd" service does not resolve the vnet/bridge connectivity > issue. All the way back in libvirt-3.2.0 (April 2017), I added commit 85bcc022 to libvirt, which caused guests to be reconnected to their configured bridge, but unfortunately was only thinking about the bridges created by libvirt's virtual networks. So if you restart libvirtd, it will check all the guest network connections that are configured as <interface type='network'> and reconnect them properly. This behavior was fixed *properly* by Dan Berrange in libvirt-5.3.0 commit de938b92, which moved the check for proper connection of tap devices out of libvirt's virtual network driver and into the qemu driver. Unfortunately for RHEL7, this was done as a part of a fairly major refactoring of the network driver, so it won't be possible to simply backport that patch (or even just a few patches). Instead, fixing it in RHEL7 will require making a downstream-only patch (or rebasing, but I don't think we'll be doing that for RHEL7 anymore). Alternately, as a workaround that didn't require any changes to libvirt code, you could just define a libvirt virtual network that uses your existing bridge, e.g.: <network> <name>br0-net</name> <bridge name='br0'/> <forward mode='bridge'/> </network> and then configure your guests to use that network, e.g. instead of the guest config containing: <interface type='bridge'> <source bridge='br0'/> ... it would have: <interface type='network'> <source network='br0-net'/> ... Of course whether you do this now, or wait until there is a patch to libvirt equivalent to Dan's de938b92, you will *still* need to restart libvirtd after restarting the network service. (I'm curious - has the network service always tore down and recreated configured bridges when it's restarted? If so, I'm surprised that we've never encountnered (or even heard of) this problem before...) Sorry, I noticed after I hit save that I had left out a crucial part of the explanation - the reconnecting of all guest tap devices to their configured bridges happens only when libvirtd is restarted. (I guess that becomes obvious in the 2nd to last paragraph, but I hadn't explicitly stated it). On rhel8 system, the network.service is deprecated. So I tried to use NetworkManager.service instead to reproduce the issue, but it can not be reproduced. Test on: qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64 libvirt-libs-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64 kernel-4.18.0-187.el8.x86_64 And also libvirt-libs-4.5.0-41.module+el8.2.0+5928+db9eea38.x86_64 qemu-kvm-2.12.0-99.module+el8.2.0+5827+8c39933c.x86_64 Steps: Scenario 1: 1. Create linux bridge 'br0' with nmcli, and start vm with: <interface type='bridge'/> <source bridge='br0'/> 2. On the host, run "systemctl restart NetworkManager", after then, check the network function on vm, it works well. Scenario 2: 1. Create a network as: # virsh net-dumpxml host-bridge --inactive <network> <name>host-bridge</name> <forward mode='bridge'/> <bridge name='br0'/> </network> 2. Start a vm with "<interface type='network'/> <source network='host-bridge'>"; 3. On the host, run "systemctl restart NetworkManager", after then, check the network function on vm, it works well. Ameya, can you please take care of a KCS for this + note that there is a manual workaround suggested in the description. Thank you! I noticed a few weeks ago that libvirt does still have a bug with re-attaching tap devices when the interface is <interface type='bridge'>. I *thought* that I had written and sent a patch for this upstream, but it appears it got lost. I've put that back on my to-do list. Beyond this bugfix (which would make a manual restart of libvirtd.service properly reattach *all* tap devices rather than just most of them), a more generalized automatic reattach cannot realistically be added to libvirt any time soon. A NetworkManager dispatcher script would be a nice automatic workaround in the meantime, but of course if the customer isn't using NetworkManager, then they will need to do add something onto the network.service (or whatever networking service they're using); this could be anything from a simplistic restart of libvirtd.service (assuming the bugfix mentioned in the previous paragraph) to a script that searches for all tap devices in all guests, and individually7 reattaches them each to the appropriate bridge. NB: the bugfix I referenced in Comment 26 is in libvirt-7.0.0 upstream: commit dad50cf855d26f35c44ab0eceb6436c6d6a17e06 Author: Laine Stump <laine> Date: Tue Oct 20 12:35:09 2020 -0400 conf: make virDomainNetNotifyActualDevice() callable for all interface types commit c2b2cdf74659bf13970b8dc8a2db1dd888ac7822 Author: Laine Stump <laine> Date: Fri Jan 8 00:36:31 2021 -0500 call virDomainNetNotifyActualDevice() for all interface types Again - it doesn't make the reconnect automatic, but it does provide a simple way to make it happen - just run "systemctl restart libvirtd.service" Test on latest rhel8.5 with libvirt-7.6.0-1.module+el8.5.0+12097+2c77910b.x86_64, the result is as expected. 1. Start vm with 2 interfaces, 1 connected to the bridge directly, another connected to the network: <network> <name>br0-net</name> <forward mode='bridge'/> <bridge name='br0'/> </network> 2. check the tap devices are connected to the bridge: # bridge link | grep master 7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 hwmode VEB 125: vnet24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 126: vnet25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 3. set the br0 down and up and check no tap devices are re-connected: # nmcli con down br0 Connection 'br0' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/68) # nmcli con up br0 Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/79) # bridge link | grep master 7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 hwmode VEB 4. restart libvirtd and check the tap devices are re-connected # systemctl restart libvirtd # bridge link | grep master 7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 7: enp130s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 hwmode VEB 125: vnet24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 126: vnet25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 Test delete and create the bridge "br0", the result is also as expected. Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. There is no work being done in engineering on this issue. I had thought that the knowledgebase article was enough to consider it fixed (especially since all RHEL8 libvirts are now rebased to something beyond RHEL7.0.0. Basically the solution is that you should restart libvirtd.service any time you restart network.service. This is described in the KB article created by achareka (see Comment 32). (NB: my caveat in Comment 37 about users on RHEL-AV-8.3.1 using libvirt-6.6.0 should no longer be important, since nobody should be using that release any longer). With that KB article published, I think we can/should close this BZ, as Ademar suggested in Comment 28 |