1. Please describe the problem: Running the current Fedora 42 and 41 cloud images in libvirt, using a bridge VLAN, the network performance is highly degraded (~16-200KB/s). The same behavior is not seen on current EL9 images, but was reproduced with the current OpenSUSE Tumbleweed, which is also on 6.14.2. Here are the relevant errors on boot, repeated many times: ``` [Tue Apr 22 17:15:21 2025] net_ratelimit: 19 callbacks suppressed [Tue Apr 22 17:15:21 2025] enp1s0: bad gso: type: 4, size: 1448 [Tue Apr 22 17:15:21 2025] enp1s0: bad gso: type: 4, size: 1448 [Tue Apr 22 17:15:21 2025] enp1s0: bad gso: type: 4, size: 1448 ``` iperf TCP performance is somehow OK, but any http/s client (dnf, wget, curl, etc.) have the same issues: ``` curl https://hil-speed.hetzner.com/1GB.bin >/dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 1024M 0 1615k 0 0 36118 0 8:15:28 0:00:45 8:14:43 33159 ``` Compare to freshly deploy AlmaLinux 9.5 on same hypervisor and otherwise identical virt config: ``` curl https://hil-speed.hetzner.com/1GB.bin >/dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 37 1024M 37 381M 0 0 13.9M 0 0:01:13 0:00:27 0:00:46 10.6M ``` We have also tried using e1000, e1000e, and rtl8193 instead of virtio and have the same performance regression. 2. What is the Version-Release number of the kernel: On Fedora 42, this behavior was seen on these kernel versions: kernel-core-6.14.0-63.fc42.x86_64 kernel-core-6.14.2-300.fc42.x86_64 kernel-core-6.14.3-300.fc42.x86_64 On Fedora 41: 6.11.4-301.fc41.x86_64 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes - the issue does not occur in the Fedora 40 40-1.14 cloud image. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: Launch a Fedora 42 (or 41) VM from cloud image with libvirt. Network as virtio with bridged VLAN. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Haven't tested this yet. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. ``` Apr 22 16:53:48 zoey NetworkManager[978]: <info> [1745366028.8127] manager: NetworkManager state is now CONNECTED_SITE Apr 22 16:53:48 zoey NetworkManager[978]: <info> [1745366028.8128] device (enp1s0): Activation: successful, device activated. Apr 22 16:53:48 zoey NetworkManager[978]: <info> [1745366028.8132] manager: NetworkManager state is now CONNECTED_GLOBAL Apr 22 16:53:48 zoey chronyd[896]: Source 65.74.88.213 online Apr 22 16:53:48 zoey chronyd[896]: Source 168.235.89.132 online Apr 22 16:53:48 zoey chronyd[896]: Source 108.61.73.244 online Apr 22 16:53:48 zoey chronyd[896]: Source 23.168.24.210 online Apr 22 16:53:49 zoey chronyd[896]: Selected source 23.168.24.210 (2.fedora.pool.ntp.org) Apr 22 16:53:58 zoey systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. Apr 22 16:53:58 zoey audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/syst> Apr 22 16:54:00 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:00 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:00 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:00 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:00 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:01 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:01 zoey kernel: enp1s0: bad gso: type: 4, size: 1368 Apr 22 16:54:02 zoey kernel: enp1s0: bad gso: type: 4, size: 1448 Apr 22 16:54:02 zoey kernel: enp1s0: bad gso: type: 4, size: 1448 Apr 22 16:54:02 zoey kernel: enp1s0: bad gso: type: 4, size: 1448 Apr 22 16:54:05 zoey kernel: net_ratelimit: 23 callbacks suppressed ``` Reproducible: Always
Libvirt hosts are SUSE Harvester v1.4.1 (released January 2025) running SLE Micro kernel 5.14.21-150500.55.88-default on kubevirt v1.2.2.
Test it in a newer hypervisor environment: Harvester v1.4.2 (released March 11, 2025) that is based on kuvebirt v1.3.1, so I suspect that somewhere between at least v.1.2.2 and v.1.3.1 of kubevirt, there's something that is not playing well with 6.14. I also reproduced this on OpenSUSE Tumbleweed (6.14.2) and Ubuntu 25.04 (6.14.0), so it certainly seems like an upstream kernel issue.
Also filed a bug with OpenSUSE Tumbleweed: https://bugzilla.opensuse.org/show_bug.cgi?id=1241662
Beyond the different kubevirt versions, the physical NICs are also different: Affected hypervisor NICs (the NetXtreme-E's are the relevant ones here): ``` 21:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 12) 21:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 12) 63:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 63:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) ``` Unaffected hypervisor NICs: ``` 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe 17:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) 17:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01) ```
After doing some more testing in different clusters and versions, it's not related to the kubevirt version, but a regression for the Ethernet controller: Broadcom Inc. and subsidiaries BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet (rev 12) NIC. The Harvester/Kubervirt version didn't matter. On another cluster with mixed NICs, I was able to reproduce it and then migrate to a different host in the same cluster with a BCM57416 10G NIC and then it worked. Tested with a few different Broadcom NICs, but the BCM57508 was the only problematic one.
Per the cross-reported OpenSUSE ticket, it appears to be related to this upstream: https://lore.kernel.org/lkml/1d388413ab9cfd765cd2c5e05b5e69cdb2ec5a10.camel@webked.de/