Hide Forgot
+++ This bug was initially created as a clone of Bug #1837395 +++ [See the original Bug for full description and history] A basic description of the problem: Occasionally if one guest is starting up simultaneously with another guest going down, the guest that is going down may "tear down" some network-related machinery which is based on tap device name (e.g. nwfilter rules, OVS ports) after the tap device itself is already in use by the *new* guest. Depending on the exact circumstances, that could lead to a failure of the old guest teardown (annoying, but not horrible), or it could lead to the old guest tearing down what was just setup for the new guest (very VERY bad). The core cause of this problem is that libvirt lets the Linux kernel choose the exact device name to be used when creating a new tap device (by sending a parameterized name to the kernel, i.e. "vnet%d"), and the kernel's algorithm always chooses the lowest numbered name that is currently unused, even if it was *just now* freed up due to auto-deletion of the device when a qemu process exits. This means that very often the tap device name given to a new guest will be the device name of the most recently shutdown guest; if the two events are happening concurrently with each other, then this is a problem. These upstream patches should resolve the issue (by modifying libvirt to choose the tap device names itself, including using a monotonically increasing counter so that the same device name is never used twice). The patches are waiting for the libvirt release freeze to be over, and then will be pushed upstream (and so will be in libvirt-6.8.0), with appropriate backports to follow: https://www.redhat.com/archives/libvir-list/2020-August/msg00962.html
These two patches were pushed upstream to fix the problem. They are included in libvirt-6.8.0 commit d7f38beb2ee072f1f19bb91fbafc9182ce9b069e Author: Laine Stump <laine> Date: Sun Aug 23 14:57:19 2020 -0400 util: replace macvtap name reservation bitmap with a simple counter Author: Laine Stump <laine> Date: Sun Aug 23 21:20:13 2020 -0400 util: assign tap device names using a monotonically increasing integer
Accidentally omitted the commit id of the 2nd patch: commit 95089f481e003d971fe0a082018216c58c1b80e5 Author: Laine Stump <laine> Date: Sun Aug 23 21:20:13 2020 -0400 util: assign tap device names using a monotonically increasing integer
Please also include: commit 2b6cd855042984b87beb7e3c30b67b0f586d89bb Author: Ján Tomko <jtomko> CommitDate: 2020-09-14 13:02:56 +0200 util: virNetDevTapCreate: initialize fd to -1
Can not reproduce the bug on libvirt-6.0.0-31.module+el8.4.0+9115+3b7430a4.x86_64 with the steps in https://bugzilla.redhat.com/show_bug.cgi?id=1837395#c24 Test on libvirt-6.0.0-32.module+el8.4.0+9115+3b7430a4.x86_64, the result is as expected. 1. prepare a vm with 2 interfaces as below: # virsh dumpxml rhel | grep /interface -B6 <interface type='network'> <mac address='52:54:00:66:27:f8'/> <source network='default'/> <model type='virtio'/> <filterref filter='clean-traffic'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <interface type='direct'> <mac address='52:54:00:6c:8f:64'/> <source dev='eno1' mode='vepa'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </interface> 2. restart libvirtd and start the vm: # systemctl restart libvirtd; virsh start rhel Domain rhel started # ip l ... 62: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN mode DEFAULT group default qlen 1000 link/ether fe:54:00:66:27:f8 brd ff:ff:ff:ff:ff:ff 63: macvtap0@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500 link/ether 52:54:00:6c:8f:64 brd ff:ff:ff:ff:ff:ff # virsh destroy rhel ; virsh start rhel Domain rhel destroyed Domain rhel started # ip l ... 84: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN mode DEFAULT group default qlen 1000 link/ether fe:54:00:66:27:f8 brd ff:ff:ff:ff:ff:ff 85: macvtap1@eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 500 link/ether 52:54:00:6c:8f:64 brd ff:ff:ff:ff:ff:ff 2. start anther system with 2 interfaces including direct type and network type, the device name are macvtap2 and vnet2, which is as expected. 3. hotplug interface and check the target dev name, it is vnet3, which is expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1762