Bug 2224236 - [RHOSP17.1] Virtual Machine With iavf Driver Flaps (up -> down -> up -> down)
Summary: [RHOSP17.1] Virtual Machine With iavf Driver Flaps (up -> down -> up -> down)
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z2
: ---
Assignee: Robin Jarry
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 2228156
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-20 08:49 UTC by Vadim Khitrin
Modified: 2023-08-11 13:59 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
In this release of RHOSP, there is a known issue where SR-IOV interfaces that use Intel X710 and E810 series controller virtual functions (VFs) with the iavf driver can experience network connectivity issues that involve link status flapping. The affected guest kernel versions are: + * RHEL 8.7.0 -> 8.7.3 (No fixes planned. End of life.) * RHEL 8.8.0 -> 8.8.2 (Fix planned in version 8.8.3.) * RHEL 9.2.0 -> 9.2.2 (Fix planned in version 9.2.3.) * Upstream Linux 4.9.0 -> 6.4.* (Fix planned in version 6.5.) + Workaround: There is none, other than to use a non-affected guest kernel.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2097656 1 medium ASSIGNED [iavf]WARNING: CPU: 12 PID: 0 at net/sched/sch_generic.c:472 dev_watchdog+0x272/0x280 2023-08-03 08:02:05 UTC
Red Hat Issue Tracker OSP-26766 0 None None None 2023-07-20 08:49:50 UTC

Description Vadim Khitrin 2023-07-20 08:49:02 UTC
Description of problem:
NOTE: Filing this under the `qemu-kvm-rhev` component since we don't have a clear RCA. For now, this should act as a tracker bug for OSP.

When spawning a virtual machine with SR-IOV VF interface (iavf inside the virtual machine) from Intel X710 (i40e driver on the hypervisor), we observe flapping behavior for the driver (up -> down -> up -> down), and the interface is not able to send traffic.
The driver flaps every 6 seconds.

On the compute node, we can see this message in `dmesg`:
```
[ 2208.980821] i40e 0000:63:00.2: VF 3 in reset. Try again.
[ 2209.047800] i40e 0000:63:00.2: VF 3 in reset. Try again.
```

On the virtual machine, we can see these messages in `dmesg`:
```
[  456.791744] iavf 0000:05:00.0 eth1: NIC Link is Up Speed is 10 Gbps Full Duplex
[  462.411964] iavf 0000:05:00.0 eth1: NIC Link is Up Speed is 10 Gbps Full Duplex
[  468.557711] iavf 0000:05:00.0 eth1: NIC Link is Up Speed is 10 Gbps Full Duplex
[  474.709731] iavf 0000:05:00.0 eth1: NIC Link is Up Speed is 10 Gbps Full Duplex
[  480.325747] iavf 0000:05:00.0 eth1: NIC Link is Up Speed is 10 Gbps Full Duplex
```

Rebooting the virtual machine usually solves the issue.

Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230712.n.1
Kernel: 5.14.0-284.23.1.el9_2.x86_64 

How reproducible:
On a high frequency.

Steps to Reproduce:
1. Deploy OSP 17.1 with SR-IOV enabled Intel X710 interfaces
2. Spawn virtual machines
3. Check if there is connectivity on the SR-IOV VF interface from the Intel X710 interface

Actual results:
Sometimes, there is no connectivity for the attached SR-IOV VF interface.

Expected results:
VM boots up with connectivity on the SR-IOV VF interface.

Additional info:
* Did not observe a similar behavior on different drivers. For example, for the Mellanox ConnectX-6 interface, the driver `mlx5_core` is used inside a virtual machine, and it is stable.
* Reproduced this issue on Intel CPU and AMD CPU deployments.
* Reproduced this issue on Intel X710 NIC and OEM (Dell) Intel X710 NIC.
* Attempted to upgrade to a newer firmware, `22.0.9`, and still observing this.


Note You need to log in before you can comment on or make changes to this bug.