RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1901064 - Commit b144f013fc16a06d7a4b9a4be668a3583fafeda2 'i40e: don't report link up for a VF who hasn't enabled queues' introducing issues with VM using DPDK
Summary: Commit b144f013fc16a06d7a4b9a4be668a3583fafeda2 'i40e: don't report link up f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.9
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Stefan Assmann
QA Contact: Hekai Wang
URL:
Whiteboard:
Depends On:
Blocks: 1916612 1926855
TreeView+ depends on / blocked
 
Reported: 2020-11-24 12:12 UTC by Andreas Karis
Modified: 2024-10-01 17:07 UTC (History)
30 users (show)

Fixed In Version: kernel-3.10.0-1160.21.1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1926855 (view as bug list)
Environment:
Last Closed: 2021-03-16 13:52:38 UTC
Target Upstream Version:
Embargoed:
yuma: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 6177842 0 None None None 2021-07-09 17:57:10 UTC

Description Andreas Karis 2020-11-24 12:12:20 UTC
Description of problem:

## Where we currently stand ##

Issue as reported from the customer: 

Compute with SRIOV enabled.
VF Communication down after Cisco Leaf Switch reload or Switch port bounce, even though PF was UP.

The only valid scenario that we could consistently confirm is:

* spawn a VM on a compute node with an i40e VF / iavf SR-IOV interface
* identify that VF - what's the bus address on the hypervisor
* run:
~~~
logger "shutting link down now for iavf test"
~~~
* shut down the link from the switch 
---> 1. Switch port connected to enp96s0f0 was brought down.
---> 2. PF went down, and VFs too went down(linkstate auto) (No communication from VNF to Switch)

* run:
~~~
logger "enabling link now for iavf test"
~~~
* unshut the link from the switch
---> 3. Swicth port from #1 brought up. 
---> 4. PF came up (link state Up), but the VF remained down(linkstate auto) (No communication from VNF to Switch)

The issue can be easily reproduced by flapping/bouncing the switch port.
It has been tested multiple times and works well with kernel 1062. and doesnt not work with 1127 & 1160.
The issue can only be reproduced with the customer's instance which uses DPDK inside the virtual machine.
The customer's instance uses:  DPDK.ORG VERSION       : DPDK 19.08.0
The issue cannot be reproduced with RHEL 7 or RHEL 8 as the instance operating system.

Findings from Red Hat:

The instance itself seems to use DPDK as the Virtual Function (VF) driver.
The issue cannot be reproduced with RHEL instances, indicating a mismatch / incompatibility between the DPDK PMD used inside the instance and the kernel's Physical Function (PF) driver.

Comment 2 Andreas Karis 2020-11-24 12:15:04 UTC
We could nail this down further:

1094 = Works good after the switch port bounce, just like 1062.
1095 = Doesn’t work after the switch port bounce, just like 1127.

We then ran a git bisect between the 2 kernel versions and found that the following kernel works good:

3.10.0-1095.el7.02774578_revert_b144f013fc16.x86_64 >>>Good

Created with:
~~~
git checkout kernel-3.10.0-1095.el7
git revert b144f013fc16
~~~

Here's the content of that patch:
~~~
[root@rhel-82 rhel7]# git show b144f013fc16 | cat
commit b144f013fc16a06d7a4b9a4be668a3583fafeda2
Author: Stefan Assmann <sassmann>
Date:   Wed Sep 4 10:49:53 2019 -0400

    [netdrv] i40e: don't report link up for a VF who hasn't enabled queues
    
    Message-id: <20190904105010.19041-66-sassmann>
    Patchwork-id: 270894
    O-Subject: [RHEL7.8 PATCH 65/82] i40e: don't report link up for a VF who hasn't enabled queues
    Bugzilla: 1720236
    RH-Acked-by: Jarod Wilson <jarod>
    RH-Acked-by: John Linville <linville>
    RH-Acked-by: Corinna Vinschen <vinschen>
    RH-Acked-by: Tony Camuso <tcamuso>
    
    From: Jacob Keller <jacob.e.keller>
    
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1720236
    Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23319550
    
    Commit d3d657a90850 ("i40e: update VFs of link state after
    GET_VF_RESOURCES") modified the PF driver to notify a VF of
    its link status immediately after it requests resources.
    
    This was intended to fix reporting on VF drivers, so that they would
    properly report link status.
    
    However, some older VF drivers do not respond well to receiving a link
    up notification before queues are enabled. This can cause their state
    machine to think that it is safe to send traffic. This results in a Tx
    hang on the VF.
    
    More recent versions of the old i40evf and all versions of iavf are
    resilient to these early link status messages. However, if a VM happens
    to run an older version of the VF driver, this can be problematic.
    
    Record whether the PF has actually enabled queues for the VF. When
    reporting link status, always report link down if the queues aren't
    enabled. In this way, the VF driver will never receive a link up
    notification until after its queues are enabled.
    
    Signed-off-by: Jacob Keller <jacob.e.keller>
    Tested-by: Andrew Bowers <andrewx.bowers>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher>
    
    Signed-off-by: Stefan Assmann <sassmann>
    (cherry picked from commit 2ad1274fa35ace5c6360762ba48d33b63da2396c)
    Signed-off-by: Jan Stancek <jstancek>

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 02b09a8ad54c..12f04f36e357 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -55,7 +55,12 @@ static void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
 
 	pfe.event = VIRTCHNL_EVENT_LINK_CHANGE;
 	pfe.severity = PF_EVENT_SEVERITY_INFO;
-	if (vf->link_forced) {
+
+	/* Always report link is down if the VF queues aren't enabled */
+	if (!vf->queues_enabled) {
+		pfe.event_data.link_event.link_status = false;
+		pfe.event_data.link_event.link_speed = 0;
+	} else if (vf->link_forced) {
 		pfe.event_data.link_event.link_status = vf->link_up;
 		pfe.event_data.link_event.link_speed =
 			(vf->link_up ? VIRTCHNL_LINK_SPEED_40GB : 0);
@@ -65,6 +70,7 @@ static void i40e_vc_notify_vf_link_state(struct i40e_vf *vf)
 		pfe.event_data.link_event.link_speed =
 			i40e_virtchnl_link_speed(ls->link_speed);
 	}
+
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, VIRTCHNL_OP_EVENT,
 			       0, (u8 *)&pfe, sizeof(pfe), NULL);
 }
@@ -2364,6 +2370,8 @@ static int i40e_vc_enable_queues_msg(struct i40e_vf *vf, u8 *msg)
 		}
 	}
 
+	vf->queues_enabled = true;
+
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES,
@@ -2385,6 +2393,9 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg)
 	struct i40e_pf *pf = vf->pf;
 	i40e_status aq_ret = 0;
 
+	/* Immediately mark queues as disabled */
+	vf->queues_enabled = false;
+
 	if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states)) {
 		aq_ret = I40E_ERR_PARAM;
 		goto error_param;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index f65cc0c16550..7164b9bb294f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -99,6 +99,7 @@ struct i40e_vf {
 	unsigned int tx_rate;	/* Tx bandwidth limit in Mbps */
 	bool link_forced;
 	bool link_up;		/* only valid if VF link is forced */
+	bool queues_enabled;	/* true if the VF queues are enabled */
 	bool spoofchk;
 	u16 num_mac;
 	u16 num_vlan;
~~~

Comment 3 Andreas Karis 2020-11-24 12:17:33 UTC
I have a feeling that the problem is that the customer's DPDK application is not properly re-initializing its queues after receiving the link up. Does that theory make sense, and is there anything more that we could gather from the hypervisor to prove this (e.g., is there a way to gather when the virtual machine initializes the VF's queues?)

Thanks,

Andreas

Comment 4 Stefan Assmann 2020-11-24 14:14:13 UTC
Hi Andreas,

I understand the symptoms of the issues, but I need sosreports from host and guest (taken after the issue occurred) for analysis.
From what you describe I suspect an issue with the VF driver inside the guest. You also mention DPDK and I'm not sure how exactly DPDK is used here. If possible please give me some details about that as well.
Thanks!

Comment 7 Andreas Karis 2020-11-24 14:50:26 UTC
Details about the sosreport inside the earlier private comments.

Wrt DPDK: The customer runs a 3rd party provided VNF which runs an application that uses the DPDK i40e or iavf PMD to manage its interfaces. In this case, at least vf 28 of interface enp96s0f0 with PCI bus address 60:05.4

>> From the VM side, we dont see the the port/VF link being down. only the traffic is failing to come-in or go-out. It is detected at the bfd configured in the app layer.
hence the dmesg did not show anything.

But that makes sense anyway, as the interface is not managed by the instance's kernel. Instead, it's passed through to the customer's instance. The instance OS is IIRC based on Ubuntu. 

They could not reproduce the issue with RHEL, and the issue first appeared after upgrading our kernel beyond that aforementioned commit.

Let me know if you need anything else in specific, I'm more than happy to get you that.

Comment 8 Andreas Karis 2020-11-24 14:51:57 UTC
* They could not reproduce the issue with RHEL 

Let me clarify that: they could not reproduce the issue when using a RHEL instance, with the kernel iavf driver inside the instance. We did not test with DPDK inside RHEL, of course.

Comment 11 Andreas Karis 2020-11-24 14:59:08 UTC
Also, in case this wasn't clear:

We have the hypervisor with kernel 1094, and a Ubuntu VM using DPDK.

This setup works fine.

We then upgrade the hypervisor to kernel 1095, with the same Ubuntu VM.

This setup shows issues. When testing with a RHEL VM, we don't have any problems, the same test works o.k., even with kernel 1095 on the hypervisor.

We then installed kernel 3.10.0-1095.el7.02774578_revert_b144f013fc16.x86_64 on the hypervisor, with the same Ubuntu VM.

This setup works fine, too.

Comment 13 Stefan Assmann 2020-11-25 14:04:23 UTC
I've checked both sosreports for i40e issues and there's nothing abnormal in the logs from an i40e perspective.
Whatever is running on the guest side needs to deal with the i40e changes in
2ad1274fa35a i40e: don't report link up for a VF who hasn't enabled queues
So as long as it cannot be reproduced with RHEL on the guest side in conjunction with the in-kernel iavf driver there's nothing I can do to help.

Comment 14 Andreas Karis 2020-11-25 15:26:39 UTC
Hi,

Thanks for looking at that. 

> Whatever is running on the guest side needs to deal with the i40e changes in
> 2ad1274fa35a i40e: don't report link up for a VF who hasn't enabled queues

I totally understand that, and my expectation was not that there's an issue with the hypvervisor RHEL, but indeed one with what the instance is doing. I was just curious to know  what needs to be done inside the instance to make this work. I.e., queues need to be enabled, but I was wondering if you had some further insight into that process and could give some further hints. Alternatively, would you know a contact at Intel that I could reach out to who might be of help?

Thanks,

Andreas

Comment 18 Lihong Yang 2020-11-30 20:14:54 UTC
+ M Jay from Intel (Jayakumar, Muthurajan <muthurajan.jayakumar>) to provide some input on the issue since it is only observed with DPDK involved. 
 
Hi M Jay,
would you please help look into issue to see whether it is a known issue with DPDk involved?

Thanks,
Lihong

Comment 19 Andreas Karis 2020-12-07 09:29:57 UTC
Hi,

According to the customer's partner's engineering, they had run into similar issues already in the past. These issues were fixed by installing on the hypervisor  Intel's 2.12.6.3 i40e PF driver or later.

So, the customer installed [0] the i40e-2.13.10.tar.gz i40e driver by Intel on the hypervisor and that fixed their issue.

There hence must be a bug in our kernel driver which I assume was introduced by the aforementioned commit and then fixed in a later version of the Intel PF driver. Can we investigate that?

Thanks!

Andreas

---

[0]

https://downloadcenter.intel.com/download/24411/Intel-Network-Adapter-Driver-for-PCIe-40-Gigabit-Ethernet-Network-Connections-Under-Linux-
~~~
From the README:

Building and Installation
=========================

To build a binary RPM package of this driver
--------------------------------------------
Note: RPM functionality has only been tested in Red Hat distributions.

1. Run the following command, where <x.x.x> is the version number for the
   driver tar file.

   # rpmbuild -tb i40e-<x.x.x>.tar.gz

   NOTE: For the build to work properly, the currently running kernel MUST
   match the version and configuration of the installed kernel sources. If
   you have just recompiled the kernel, reboot the system before building.

2. After building the RPM, the last few lines of the tool output contain the
   location of the RPM file that was built. Install the RPM with one of the
   following commands, where <RPM> is the location of the RPM file:

   # rpm -Uvh <RPM>
       or
   # dnf/yum localinstall <RPM>

NOTES:
- To compile the driver on some kernel/arch combinations, you may need to
install a package with the development version of libelf (e.g. libelf-dev,
libelf-devel, elfutilsl-libelf-devel).
- When compiling an out-of-tree driver, details will vary by distribution.
However, you will usually need a kernel-devel RPM or some RPM that provides the
kernel headers at a minimum. The RPM kernel-devel will usually fill in the link
at /lib/modules/'uname -r'/build.
~~~

In my lab, I had to run:
~~~
curl -O https://downloadmirror.intel.com/24411/eng/i40e-2.13.10.tar.gz
yum install kernel-devel-$(uname -r)
yum install gcc -y
rpmbuild -tb i40e-2.13.10.tar.gz 
yum localinstall /root/rpmbuild/RPMS/x86_64/i40e-2.13.10-1.x86_64.rpm -y
reboot
~~~

Comment 20 Stefan Assmann 2020-12-07 09:39:50 UTC
The Intel OOT i40e driver is solely maintained by Intel and is very different from the upstream i40e driver. Any requests regarding the OOT driver need to be directed to Intel.

Comment 21 Andreas Karis 2020-12-07 09:44:56 UTC
Hi Stefan,

O.k., but we have a bug in the downstream kernel i40e driver and need this fixed. This is not a request regarding the OOT driver, it's a request to figure out what bug be have in the downstream RHEL driver, because with the OOT i40e driver things work fine. Intel must be aware of what went wrong / what was fixed in there, and should be able to help your team single out and port the correct fix into the in tree downstream driver (assuming that they already pushed the fix into the upstream in-tree driver?).

- Andreas

Comment 22 Andreas Karis 2020-12-07 09:49:23 UTC
Just to clarify: 

RHEL 7 kernel in tree drivers:
1094 = Works good after the switch port bounce, just like 1062.
1095 = Doesn’t work after the switch port bounce, just like 1127.

Installing the OOT driver then fixes it. That leads me to the assumption that a bug was introduced in 1095 (the commit I singled out, likely) and then fixed by a later commit in the out of tree driver:

I looked at the changelog on sourceforge:
https://sourceforge.net/projects/e1000/files/i40e%20stable/

The aforementioned commit was introduced with "[netdrv] i40e: don't report link up for a VF who hasn't enabled queues" is fixed from driver 2.9.21.

That's only a guess but I'd guess that the Intel out of tree driver had issues starting with 2.9.21 that were then fixed with 2.12.6.3, so it'd be any commit after that. But then that's all speculation.

Thanks for the prompt answer!!!

Comment 23 Stefan Assmann 2020-12-07 10:04:31 UTC
Hi Andreas,

sorry but I disagree, there's no prove that the problem is with the upstream i40e driver. The fact that the OOT i40e works with the customers VM setup merely suggested that the OOT driver handles things differently from the upstream i40e driver.
It's a good find though and may help Intel to spot the difference. Though I still believe the real problem is with the customers VM unable to cope with the change we singled out.

Comment 30 Andreas Karis 2020-12-15 16:20:34 UTC
We went into the vendor's lab and ran another test today.

On the test hypervisor, we spawned the vendor's VNF and a RHEL VM, all on the same Physical Function.

We ran a test with kernel 3.10.0-1160.6.1.el7.x86_64 with probes injected via systemtap. We also tested with kernel 5.10.0-1.el7.elrepo.x86_64 from https://elrepo.org/linux/kernel/el7/x86_64/RPMS/  (to be precise https://elrepo.org/linux/kernel/el7/x86_64/RPMS/kernel-ml-5.10.0-1.el7.elrepo.x86_64.rpm) to see if there was a change from the RHEL 7 stock kernel to upstream. 
Tests were with private flag link-down-on-close = off, but in all cases the RHEL instance detected the link flap and its network came back up. The vendor's instance did not detect the link down event on the physical layer. Instead, it relies on higher level protocols (BFD I think) to detect a broken connection to its orchestrator. The orchestrator will then reported the instance as down and it would not come back up even after the physical link was brought up again.
We are going to run another test session with --priv-flag link-down-on-close = on, however the vendor told me that this was already tested and made no difference.
We also tested with link-down-on-close = on and link-down-on-close = off with the Intel out of tree driver, and the instance's network always came back up.

I have a more complete picture now and can state the following:

The instance uses DPDK's KNI (https://doc.dpdk.org/guides/prog_guide/kernel_nic_interface.html). It does not detect link flaps on the physical link layer, or at the least does not report those to the kernel via KNI. My theory is that the instance's application simply ignores underlying link status changes and relies on upper layer protocols to detect broken connectivity.

In commit b144f013fc16a06d7a4b9a4be668a3583fafeda2, Intel introduced a change in the i40e PF driver's behavior (https://bugzilla.redhat.com/show_bug.cgi?id=1901064#c2). 
"However, some older VF drivers do not respond well to receiving a link
    up notification before queues are enabled. This can cause their state
    machine to think that it is safe to send traffic. This results in a Tx
    hang on the VF. (...)
Record whether the PF has actually enabled queues for the VF. When
    reporting link status, always report link down if the queues aren't
    enabled.  link status, always report link down if the queues aren't
    enabled. In this way, the VF driver will never receive a link up
    notification until after its queues are enabled."

The problem is that this change simultaneously fixes other issues but breaks compatibility with the vendor's instance. It's also contradictory to the desire of having a stable ABI. The instance's application uses a user space driver (DPDK iavf) and the kernel changes its behavior, breaking compatibility of this user space application with the kernel in-tree driver (and thus with the kernel).

The most important section in the new code is:
~~~
+
+	/* Always report link is down if the VF queues aren't enabled */
+	if (!vf->queues_enabled) {
+		pfe.event_data.link_event.link_status = false;
+		pfe.event_data.link_event.link_speed = 0;
+	} else if (vf->link_forced) {
~~~

The instance lands in this new logic because we explicitly disable the VF's queues when the PF's line goes down:
~~~
+	/* Immediately mark queues as disabled */
+	vf->queues_enabled = false;
+
~~~

Again, that's a change of behavior of the kernel. I *do* think that the instance's application is not catching a link down/link up event which it should catch. But before this change, it could get away with it.

Comment 31 Andreas Karis 2020-12-15 16:37:40 UTC
For the test on latest RHEL 7.9 3.10.0-1160.6.1.el7.x86_64, I created an stap script to catch events in critical sections of the code.

Dependencies for stap:
~~~
yum localinstall kernel-3.10.0-1160.6.1.el7.x86_64.rpm  kernel-debuginfo-3.10.0-1160.6.1.el7.x86_64.rpm  kernel-debuginfo-common-x86_64-3.10.0-1160.6.1.el7.x86_64.rpm  kernel-devel-3.10.0-1160.6.1.el7.x86_64.rpm -y
~~~

The stap script illustrates quite well what's going on, but I wished I had added  a little bit more verbosity, even. Either way, here it is:
~~~
function timestamp:long() { return gettimeofday_us() }

function customprint(header, msg) {
	printf("%d %s |%s| %s\n", timestamp(), ctime(), header, msg)
}

//function printvf(header, vf) {
//	printf("=========\n")
//	printf("%d |%s| $vf->vf_id: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->vf_id)
//	printf("%d |%s| $vf->trusted: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->trusted)
//	printf("%d |%s| $vf->link_forced: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->link_forced)
//	printf("%d |%s| $vf->link_up: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->link_up)
//	printf("%d |%s| $vf->queues_enabled: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->queues_enabled)
//	printf("=========\n")
//}

function pci_slot(devfn) {
	return (((devfn) >> 3) & 0x1f)
}

function pci_func(devfn) {
	return ((devfn) & 0x07)
}

function printvfarrayelement(header, vfarr, i) {
	printf("=========\n")
	printf("%d %s |%s| $vfarr[i]->pf->pdev: bus %s, slot %d, func %d\n", 
		timestamp(), 
		ctime(),
		header, 
		kernel_string_n(@cast(vfarr, "struct i40e_vf", "i40e")[i]->pf->pdev->bus->name, 48),
		pci_slot(@cast(vfarr, "struct i40e_vf", "i40e")[i]->pf->pdev->devfn), 
		pci_func(@cast(vfarr, "struct i40e_vf", "i40e")[i]->pf->pdev->devfn)
	)
	printf("%d %s |%s| $vfarr[i]->vf_id: %d\n", timestamp(), ctime(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->vf_id)
	printf("%d %s |%s| $vfarr[i]->trusted: %d\n", timestamp(), ctime(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->trusted)
	printf("%d %s |%s| $vfarr[i]->link_forced: %d\n", timestamp(), ctime(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->link_forced)
	printf("%d %s |%s| $vfarr[i]->link_up: %d\n", timestamp(), ctime(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->link_up)
	printf("%d %s |%s| $vfarr[i]->queues_enabled: %d\n", timestamp(), ctime(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->queues_enabled)
	printf("=========\n")
}

function translate_opcode(opcode) {
	if (opcode == 0) 
		return "VIRTCHNL_OP_UNKNOWN (0)"
	if (opcode == 1) 
		return "VIRTCHNL_OP_VERSION (1)"
	if (opcode == 2) 
		return "VIRTCHNL_OP_RESET_VF (2)"
	if (opcode == 3) 
		return "VIRTCHNL_OP_GET_VF_RESOURCES (3)"
	if (opcode == 4) 
		return "VIRTCHNL_OP_CONFIG_TX_QUEUE (4)"
	if (opcode == 5) 
		return "VIRTCHNL_OP_CONFIG_RX_QUEUE (5)"
	if (opcode == 6) 
		return "VIRTCHNL_OP_CONFIG_VSI_QUEUES (6)"
	if (opcode == 7) 
		return "VIRTCHNL_OP_CONFIG_IRQ_MAP (7)"
	if (opcode == 8) 
		return "VIRTCHNL_OP_ENABLE_QUEUES (8)"
	if (opcode == 9) 
		return "VIRTCHNL_OP_DISABLE_QUEUES (9)"
	if (opcode == 10) 
		return "VIRTCHNL_OP_ADD_ETH_ADDR (10)"
	if (opcode == 11) 
		return "VIRTCHNL_OP_DEL_ETH_ADDR (11)"
	if (opcode == 12) 
		return "VIRTCHNL_OP_ADD_VLAN (12)"
	if (opcode == 13) 
		return "VIRTCHNL_OP_DEL_VLAN (13)"
	if (opcode == 14) 
		return "VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE (14)"
	if (opcode == 15) 
		return "VIRTCHNL_OP_GET_STATS (15)"
	if (opcode == 16) 
		return "VIRTCHNL_OP_RSVD (16)"
	if (opcode == 17) 
		return "VIRTCHNL_OP_EVENT (17)"
	if (opcode == 20) 
		return "VIRTCHNL_OP_IWARP (20)"
	if (opcode == 21) 
		return "VIRTCHNL_OP_CONFIG_IWARP_IRQ_MAP (21)"
	if (opcode == 22) 
		return "VIRTCHNL_OP_RELEASE_IWARP_IRQ_MAP (22)"
	if (opcode == 23) 
		return "VIRTCHNL_OP_CONFIG_RSS_KEY (23)"
	if (opcode == 24) 
		return "VIRTCHNL_OP_CONFIG_RSS_LUT (24)"
	if (opcode == 25) 
		return "VIRTCHNL_OP_GET_RSS_HENA_CAPS (25)"
	if (opcode == 26) 
		return "VIRTCHNL_OP_SET_RSS_HENA (26)"
	if (opcode == 27) 
		return "VIRTCHNL_OP_ENABLE_VLAN_STRIPPING (27)"
	if (opcode == 28) 
		return "VIRTCHNL_OP_DISABLE_VLAN_STRIPPING (28)"
	if (opcode == 29) 
		return "VIRTCHNL_OP_REQUEST_QUEUES (29)"
	if (opcode == 30) 
		return "VIRTCHNL_OP_ENABLE_CHANNELS (30)"
	if (opcode == 31) 
		return "VIRTCHNL_OP_DISABLE_CHANNELS (31)"
	if (opcode == 32) 
		return "VIRTCHNL_OP_ADD_CLOUD_FILTER (32)"
	if (opcode == 33) 
		return "VIRTCHNL_OP_DEL_CLOUD_FILTER (33)"
	return "unknown opcode"
}

probe begin
{
    log("begin probe")
}

//probe module("i40e").function("i40e_vc_notify_vf_link_state")
probe module("i40e").statement("*@i40e_virtchnl_pf.c:57")
{
    // we cannot probe this here
    //printvf("i40e_virtchnl_pf.c:57", $vf)
    // instead, let's jump through a few extra hoops to get exactly the same
    vfid = $abs_vf_id - $hw->func_caps->vf_base_id
    printvfarrayelement("i40e_virtchnl_pf.c:57", $pf->vf, vfid)
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:61")
{
	customprint("i40e_virtchnl_pf.c:61", "--> Inside new logic, VF DOWN")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:64")
{
	customprint("i40e_virtchnl_pf.c:64", "--> Inside old logic, vf->link_up FORCED")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:69")
{
	customprint("i40e_virtchnl_pf.c:69", "--> Inside old logic, pfe.event_data.link_event.link_status = ...")
}

probe module("i40e").statement("*@i40e_virtchnl_pf.c:2336")
{
        customprint("i40e_virtchnl_pf.c:2336", "The VF called method to enable queues")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:2376")
{
	customprint("i40e_virtchnl_pf.c:2376", "new logic - vf->queues_enabled = true")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:2376")
{
	customprint("i40e_virtchnl_pf.c:2376", "new logic - vf->queues_enabled = true")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:2400")
{
	customprint("i40e_virtchnl_pf.c:2400", "new logic - vf->queues_enabled = false")
}

probe module("i40e").statement("*@i40e_virtchnl_pf.c:3766") {
    printf("%d |i40e_vc_process_vf_msg:3766| $v_opcode: %s\n", timestamp(), translate_opcode($v_opcode))
}

probe end
{
    log("end probe")
}
~~~

I then executed this as root:
~~~
stap i40e.stap
~~~

#####################################

Link shutdown test 

#####################################

We are focusing on this PF here. The vendor shut down the physical link from the switch ("shutdown") and the PF gets a NO-CARRIER:
~~~
# ip link ls dev enp96s0f0
10: enp96s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 10000
    link/ether 3c:fd:fe:..:..:.. brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 7 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 8 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 9 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 10 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 11 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 12 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 13 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 14 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 15 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 16 MAC fa:16:3e:1f:c5:6a, vlan 1207, spoof checking off, link-state auto, trust on  # <--- one of the vendor instances
    vf 17 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 18 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 19 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 20 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 21 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 22 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 23 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 24 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 25 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 26 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 27 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust on
    vf 28 MAC fa:16:3e:2f:2b:57, vlan 1207, spoof checking off, link-state auto, trust on  # <--- one of the vendor instances
    vf 29 MAC fa:16:3e:32:c5:e0, vlan 1207, spoof checking off, link-state auto, trust on  # <--- RHEL instance
~~~

We are simultaneously running a ping from a RHEL instance on another hypervisor to our test RHEL instance on VF 29. On link shutdown, we logically lose connectivity:
~~~
rhel-sriov-2 ~]# ping 165.251.40.136
PING 165.251.40.136 (165.251.40.136) 56(84) bytes of data.
64 bytes from 165.251.40.136: icmp_seq=1 ttl=64 time=0.103 ms
64 bytes from 165.251.40.136: icmp_seq=2 ttl=64 time=0.077 ms
(...)
64 bytes from 165.251.40.136: icmp_seq=129 ttl=64 time=0.081 ms
64 bytes from 165.251.40.136: icmp_seq=130 ttl=64 time=0.078 ms
64 bytes from 165.251.40.136: icmp_seq=131 ttl=64 time=0.081 ms
64 bytes from 165.251.40.136: icmp_seq=132 ttl=64 time=0.078 ms
64 bytes from 165.251.40.136: icmp_seq=133 ttl=64 time=0.075 ms
From 165.251.40.182 icmp_seq=151 Destination Host Unreachable
From 165.251.40.182 icmp_seq=152 Destination Host Unreachable
From 165.251.40.182 icmp_seq=153 Destination Host Unreachable
From 165.251.40.182 icmp_seq=154 Destination Host Unreachable
From 165.251.40.182 icmp_seq=155 Destination Host Unreachable
(...)
~~~

And our RHEL test instance catches the link down event:
~~~
[root@rhel-sriov ~]# [ 1472.699940] iavf 0000:00:05.0 eth0: NIC Link is Down
~~~

At the same time, the vendor's orchestrator detects that their 2 instances on this hypervisor are down and properly reports this. The instance itself, however, does not physically detect the link down:
~~~
17: gnp-40-2-1: <BROADCAST, MULTICAST, UP, LOWER_UP> mty 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:2f:2b:57 brd ff:ff:ff:ff:ff:ff
122: kni0: <BROADCAST, MULTICAST, UP, LOWER_UP> mty 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:2f:2b:57 brd ff:ff:ff:ff:ff:ff
~~~

In comes the additional debug output from the stap script:
~~~
=========
1608033193261151 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033193261158 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 0
1608033193261160 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033193261162 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033193261164 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033193261166 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033193261172 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033193261230 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033193261233 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 1
1608033193261235 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033193261237 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033193261239 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033193261240 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033193261243 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
(...)
1608033193262188 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033193262190 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 15
1608033193262192 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033193262193 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033193262195 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033193262197 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033193262199 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033193262255 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033193262258 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 16
1608033193262260 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033193262262 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033193262264 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033193262265 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033193262268 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033193263069 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033193263072 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 28
1608033193263073 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033193263075 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033193263077 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033193263078 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033193263081 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033193263137 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033193263139 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 29
1608033193263141 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033193263143 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033193263145 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033193263146 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033193263148 Tue Dec 15 11:53:13 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
(...)
~~~

So far, we can see that all VFs see link_up = 0 and their queues_enabled = 1.

Now, the new logic comes into play:
~~~
1608033201107362 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608033201107368 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201107373 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201107480 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608033201107482 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201107485 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201127726 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608033201127730 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201127734 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201157681 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608033201157683 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201157686 Tue Dec 15 11:53:21 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608033201178402 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
1608033201208148 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
1608033201232863 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
1608033201258337 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
~~~

All VF queues are disabled upon PF down.

Comment 32 Andreas Karis 2020-12-15 16:43:52 UTC
===============================
LINK UP
===============================

We switch the link back up from the switch side ("no shutdown").

Connectivity comes back from the second RHEL instance towards the RHEL instance on the hypervisor:
~~~
From 165.251.40.182 icmp_seq=732 Destination Host Unreachable
From 165.251.40.182 icmp_seq=733 Destination Host Unreachable
From 165.251.40.182 icmp_seq=734 Destination Host Unreachable
From 165.251.40.182 icmp_seq=735 Destination Host Unreachable
From 165.251.40.182 icmp_seq=736 Destination Host Unreachable
From 165.251.40.182 icmp_seq=737 Destination Host Unreachable
From 165.251.40.182 icmp_seq=738 Destination Host Unreachable
From 165.251.40.182 icmp_seq=739 Destination Host Unreachable
From 165.251.40.182 icmp_seq=740 Destination Host Unreachable
From 165.251.40.182 icmp_seq=741 Destination Host Unreachable
From 165.251.40.182 icmp_seq=742 Destination Host Unreachable
From 165.251.40.182 icmp_seq=743 Destination Host Unreachable
From 165.251.40.182 icmp_seq=744 Destination Host Unreachable
From 165.251.40.182 icmp_seq=745 Destination Host Unreachable
64 bytes from 165.251.40.136: icmp_seq=746 ttl=64 time=0.135 ms
64 bytes from 165.251.40.136: icmp_seq=747 ttl=64 time=0.074 ms
64 bytes from 165.251.40.136: icmp_seq=748 ttl=64 time=0.074 ms
64 bytes from 165.251.40.136: icmp_seq=749 ttl=64 time=0.089 ms
64 bytes from 165.251.40.136: icmp_seq=750 ttl=64 time=0.078 ms
64 bytes from 165.251.40.136: icmp_seq=751 ttl=64 time=0.080 ms
64 bytes from 165.251.40.136: icmp_seq=752 ttl=64 time=0.074 ms
64 bytes from 165.251.40.136: icmp_seq=753 ttl=64 time=0.089 ms
~~~

The RHEL instance on the hypervisor catches the link up and properly initializes its queues:
~~~
[root@rhel-sriov ~]# ip link ls dev eth0
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:32:c5:e0 brd ff:ff:ff:ff:ff:ff
[root@rhel-sriov ~]# [ 2084.565272] iavf 0000:00:05.0 eth0: NIC Link is Up 25 Gbps Full Duplex

[root@rhel-sriov ~]# ip link ls dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:32:c5:e0 brd ff:ff:ff:ff:ff:ff
[root@rhel-sriov ~]# 
~~~

At the same time, the vendors orchestrator does not detect that their instances are back up. Indeed, nothing changes, the instances remain down until they are rebooted.

Again, the stap script reveals a bit more what's going on. First of all, we have got a bunch of VFs which are bound to the hypervisor's i40evf driver and those have their queues enabled:
~~~
=========
1608033805172241 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805172248 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 0
1608033805172250 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805172252 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805172254 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805172256 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805172261 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033805172321 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805172323 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 1
1608033805172325 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805172327 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805172329 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805172330 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805172333 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033805172389 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805172392 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 2
1608033805172394 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805172396 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805172397 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805172399 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805172402 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
(...)
~~~

The interesting part shows up for VF 16 and 28. We can also see that VF 29 (the one connected to the RHEL instance) has its queues properly enabled:
~~~
1608033805173277 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805173280 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 15
1608033805173282 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805173284 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805173285 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805173287 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805173289 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033805173345 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805173347 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 16
1608033805173349 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805173351 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805173352 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805173354 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 0
=========
1608033805173356 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:61| --> Inside new logic, VF DOWN
=========
1608033805173412 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805173415 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 17
1608033805173416 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805173418 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805173420 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805173421 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805173424 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
(...)
1608033805174093 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805174095 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 27
1608033805174097 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805174098 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805174100 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805174102 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805174104 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608033805174160 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805174163 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 28
1608033805174164 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805174166 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805174168 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805174169 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 0
=========
1608033805174172 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:61| --> Inside new logic, VF DOWN
=========
1608033805174228 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:60, slot 0, func 0
1608033805174230 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 29
1608033805174232 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 1
1608033805174234 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608033805174236 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608033805174237 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608033805174240 Tue Dec 15 12:03:25 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
~~~

Comment 33 Andreas Karis 2020-12-15 16:45:54 UTC
The vendor instance's application does not re-enable its queues likely because it does not catch the link flap at all. Before the aforementioned commit to the i40e in-tree kernel driver, it did not have to do this, though. It's only since that commit that we explicitly disable queues when the physcial line goes down for the PF.

Comment 34 Andreas Karis 2020-12-15 16:50:24 UTC
We also tested with the most recent kernel from https://elrepo.org/linux/kernel/el7/x86_64/RPMS/:
~~~
[root@auh-akb-an-nfviplus-rk2-com2 ~]# uname -r
5.10.0-1.el7.elrepo.x86_64
[root@auh-akb-an-nfviplus-rk2-com2 ~]# uname -a
Linux auh-akb-an-nfviplus-rk2-com2 5.10.0-1.el7.elrepo.x86_64 #1 SMP Sun Dec 13 18:34:48 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@auh-akb-an-nfviplus-rk2-com2 ~]# 
~~~

~~~
[root@auh-akb-an-nfviplus-rk2-com2 ~]# ethtool -i enp96s0f0
driver: i40e
version: 5.10.0-1.el7.elrepo.x86_64
firmware-version: 7.10 0x800075e6 19.5.12
expansion-rom-version: 
bus-info: 0000:60:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
~~~

Same result: the instance does not detect the link up, but RHEL does.

Comment 35 Andreas Karis 2020-12-15 16:52:21 UTC
* The vendor's instance does not initialize its queues, but the RHEL instance does. And thus the vendor's instance does not reestablish connectivity.

Comment 36 Stefan Assmann 2020-12-16 08:15:39 UTC
Hi Andreas,
great analysis! It does prove that the vendor VM does not re-enable queues while the in-kernel iavf driver works as expected. You were also able to confirm the problem is still present with the latest upstream kernel. There's nothing we as Red Hat can do about this problem as we have no control about the vendors VM.
From my point of view, the kernel i40e driver does the right thing so if the customer wants to pursue this further they should engage with Intel directly to find a solution for the upstream i40e driver.

That said, even in the case of a potential upstream fix in the foreseeable future, I would object to integrate that fix into the RHEL7 kernel at this point. RHEL7 has entered maintenance phase and only highly critical fixes will be approved at this point. In my opinion this case does not match that criteria, especially since it's not an issue with the in-kernel iavf driver.

Comment 37 Andreas Karis 2020-12-17 18:57:24 UTC
Sorry for spamming this BZ even further. Please disregard, but I'd like to keep my observations in a central place = here.

==============

Hello,

I spawned a VM with RHEL 7.9 in my lab:
~~~
(overcloud) [stack@undercloud-0 ~]$ cat overcloud-test-sriov-spawn-rhel.sh
#!/bin/bash

PROVIDER_SEGMENTATION_ID_SRIOV1=306
PROVIDER_SEGMENTATION_ID_SRIOV2=307
PROVIDER_PHYSICAL_NETWORK="external"
PROVIDER_PHYSICAL_NETWORK_SRIOV1="sriov1"
PROVIDER_PHYSICAL_NETWORK_SRIOV2="sriov2"
RHEL_INSTANCE_COUNT=0
RHEL_SRIOV_INSTANCE_COUNT=2

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

source /home/stack/overcloudrc

nova quota-class-update --cores 100 default

PROVIDER_NETID=$(openstack network list | awk '/provider1/ {print $2}')
for i in `seq 1 $RHEL_SRIOV_INSTANCE_COUNT`;do
  uuid=$(uuidgen  | cut -b 1-8)
  # create one sr-iov and one without
  portid0=`neutron port-create private --name private-${uuid} | awk '$2 == "id" {print $(NF-1)}'`
  portid1=`neutron port-create sriov1 --name sriov1-${uuid} --binding:vnic-type direct  | awk '$2 == "id" {print $(NF-1)}'`
  portid2=`neutron port-create sriov2 --name sriov2-${uuid} --binding:vnic-type direct  | awk '$2 == "id" {print $(NF-1)}'`
  openstack floating ip create provider1
  FLOATINGIP=$(openstack floating ip list --network $PROVIDER_NETID | awk '($6 == "None") {print $2}' | head -1)
  openstack floating ip set --port $portid0 $FLOATINGIP
  openstack server create --flavor m1.large  --image rhel-7-sriov --nic port-id=$portid0 --nic port-id=$portid1 --nic port-id=$portid2 --key-name id_rsa   sriov_vm-${uuid}
done

for i in `seq 1 $RHEL_INSTANCE_COUNT`;do
  uuid=$(uuidgen  | cut -b 1-8)
  portid0=`neutron port-create private --name private | awk '$2 == "id" {print $(NF-1)}'`
  openstack floating ip create provider1
  FLOATINGIP=$(openstack floating ip list --network $PROVIDER_NETID | awk '($6 == "None") {print $2}' | head -1)
  openstack floating ip set --port $portid0 $FLOATINGIP
  openstack server create --flavor m1.no.pinning  --image rhel --nic port-id=$portid0 --key-name id_rsa   normal_vm-${uuid}
done
~~~

Then, I installed DPDK and compiled the KNI (kernel network interface) application with DPDK 19.08.2.

Updating RHEL 7.9 to latest and setting hugepages and isolating cores:
~~~
subscription-manager register ...
yum install vim -y
cat <<'EOF'>/etc/default/grub
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200n8 no_timer_check net.ifnames=0  default_hugepagesz=1GB hugepagesz=1G hugepages=4 iommu=pt intel_iommu=on isolcpus=4,5,6,7"
GRUB_DISABLE_RECOVERY="true"
EOF
grub2-mkconfig -o /etc/grub2.cfg
yum update -y
reboot
~~~

The result after reboot:
~~~
[root@rhel ~]# uname -r
3.10.0-1160.6.1.el7.x86_64
[root@rhel ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
~~~

After reboot of the instance, install build dependencies:
~~~
yum install '@Development Tools' -y
subscription-manager repos --enable "rhel-*-optional-rpms" --enable "rhel-*-extras-rpms"  --enable "rhel-ha-for-rhel-*-server-rpms"
rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install meson -y
yum install kernel-devel -y
yum install kernel-headers -y
yum install numactl-devel -y
~~~

Install other dependencies:
~~~
yum install driverctl -y
~~~

Download and build DPDK:
~~~
curl -O http://fast.dpdk.org/rel/dpdk-19.08.2.tar.xz
tar -xf dpdk-19.08.2.tar.xz
cd dpdk-stable-19.08.2/
meson -Dexamples=all build
ninja -C build
export RTE_TARGET=x86_64-native-linuxapp-gcc
export RTE_SDK=/root/dpdk-stable-19.08.2
make defconfig
make
export RTE_TARGET=build
make -C examples
~~~

Create DPDK application scripts:
~~~
cd /root
cat <<'EOF'>runtestpmd.sh
#!/bin/bash

/root/dpdk-stable-19.08.2/build/app/dpdk-testpmd -l 4-7 -w 0000:00:05.0 -w 0000:00:06.0 -- -i --nb-cores=2 --nb-ports=2 --total-num-mbufs=2048
EOF
cat <<'EOF'>runkni.sh
#!/bin/bash

rmmod rte_kni
insmod /root/dpdk-stable-19.08.2/build/kmod/rte_kni.ko kthread_mode=multiple
/root/dpdk-stable-19.08.2/examples/kni/build/kni -l 4-7 -w 0000:00:05.0 -- -p 0x1 -m --config="(0,5,6,7)"
EOF
chmod +x runtestpmd.sh
chmod +x runkni.sh
~~~

Set overrides:
~~~
cat <<'EOF' > /etc/modprobe.d/vfio.conf
options vfio enable_unsafe_noiommu_mode=Y
EOF
rmmod vfio-pci vfio_iommu_type1 vfio
modprobe vfio enable_unsafe_noiommu_mode=Y
driverctl set-override 0000:00:05.0 vfio-pci
driverctl set-override 0000:00:06.0 vfio-pci
~~~

Now, start KNI app:
~~~
# ./runkni.sh 
~~~

I also installed the stap script on the hypervisor:
~~~
yum update -y
yum install kernel-3.10.0-1160.6.1.el7.x86_64  kernel-debuginfo-3.10.0-1160.6.1.el7.x86_64  kernel-debuginfo-common-x86_64-3.10.0-1160.6.1.el7.x86_64  kernel-devel-3.10.0-1160.6.1.el7.x86_64 -y
yum install systemtap -y
reboot
~~~

Then, I ran the stap script:
~~~
[root@overcloud-computesriov-0 ~]# cat i40e.stap 
function timestamp:long() { return gettimeofday_us() }

function customprint(header, msg) {
	printf("%d |%s| %s\n", timestamp(), header, msg)
}

//function printvf(header, vf) {
//	printf("=========\n")
//	printf("%d |%s| $vf->vf_id: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->vf_id)
//	printf("%d |%s| $vf->trusted: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->trusted)
//	printf("%d |%s| $vf->link_forced: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->link_forced)
//	printf("%d |%s| $vf->link_up: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->link_up)
//	printf("%d |%s| $vf->queues_enabled: %d\n", timestamp(), header, @cast(vf, "struct i40e_vf", "i40e")->queues_enabled)
//	printf("=========\n")
//}

function pci_slot(devfn) {
	return (((devfn) >> 3) & 0x1f)
}

function pci_func(devfn) {
	return ((devfn) & 0x07)
}

function printvfarrayelement(header, vfarr, i) {
	printf("=========\n")
	printf("%d |%s| $vfarr[i]->pf->pdev: bus %s, slot %d, func %d\n", 
		timestamp(), 
		header, 
		kernel_string_n(@cast(vfarr, "struct i40e_vf", "i40e")[i]->pf->pdev->bus->name, 48),
		pci_slot(@cast(vfarr, "struct i40e_vf", "i40e")[i]->pf->pdev->devfn), 
		pci_func(@cast(vfarr, "struct i40e_vf", "i40e")[i]->pf->pdev->devfn)
	)
	printf("%d |%s| $vfarr[i]->vf_id: %d\n", timestamp(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->vf_id)
	printf("%d |%s| $vfarr[i]->trusted: %d\n", timestamp(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->trusted)
	printf("%d |%s| $vfarr[i]->link_forced: %d\n", timestamp(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->link_forced)
	printf("%d |%s| $vfarr[i]->link_up: %d\n", timestamp(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->link_up)
	printf("%d |%s| $vfarr[i]->queues_enabled: %d\n", timestamp(), header, @cast(vfarr, "struct i40e_vf", "i40e")[i]->queues_enabled)
	printf("=========\n")
}

function translate_opcode(opcode) {
	if (opcode == 0) 
		return "VIRTCHNL_OP_UNKNOWN (0)"
	if (opcode == 1) 
		return "VIRTCHNL_OP_VERSION (1)"
	if (opcode == 2) 
		return "VIRTCHNL_OP_RESET_VF (2)"
	if (opcode == 3) 
		return "VIRTCHNL_OP_GET_VF_RESOURCES (3)"
	if (opcode == 4) 
		return "VIRTCHNL_OP_CONFIG_TX_QUEUE (4)"
	if (opcode == 5) 
		return "VIRTCHNL_OP_CONFIG_RX_QUEUE (5)"
	if (opcode == 6) 
		return "VIRTCHNL_OP_CONFIG_VSI_QUEUES (6)"
	if (opcode == 7) 
		return "VIRTCHNL_OP_CONFIG_IRQ_MAP (7)"
	if (opcode == 8) 
		return "VIRTCHNL_OP_ENABLE_QUEUES (8)"
	if (opcode == 9) 
		return "VIRTCHNL_OP_DISABLE_QUEUES (9)"
	if (opcode == 10) 
		return "VIRTCHNL_OP_ADD_ETH_ADDR (10)"
	if (opcode == 11) 
		return "VIRTCHNL_OP_DEL_ETH_ADDR (11)"
	if (opcode == 12) 
		return "VIRTCHNL_OP_ADD_VLAN (12)"
	if (opcode == 13) 
		return "VIRTCHNL_OP_DEL_VLAN (13)"
	if (opcode == 14) 
		return "VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE (14)"
	if (opcode == 15) 
		return "VIRTCHNL_OP_GET_STATS (15)"
	if (opcode == 16) 
		return "VIRTCHNL_OP_RSVD (16)"
	if (opcode == 17) 
		return "VIRTCHNL_OP_EVENT (17)"
	if (opcode == 20) 
		return "VIRTCHNL_OP_IWARP (20)"
	if (opcode == 21) 
		return "VIRTCHNL_OP_CONFIG_IWARP_IRQ_MAP (21)"
	if (opcode == 22) 
		return "VIRTCHNL_OP_RELEASE_IWARP_IRQ_MAP (22)"
	if (opcode == 23) 
		return "VIRTCHNL_OP_CONFIG_RSS_KEY (23)"
	if (opcode == 24) 
		return "VIRTCHNL_OP_CONFIG_RSS_LUT (24)"
	if (opcode == 25) 
		return "VIRTCHNL_OP_GET_RSS_HENA_CAPS (25)"
	if (opcode == 26) 
		return "VIRTCHNL_OP_SET_RSS_HENA (26)"
	if (opcode == 27) 
		return "VIRTCHNL_OP_ENABLE_VLAN_STRIPPING (27)"
	if (opcode == 28) 
		return "VIRTCHNL_OP_DISABLE_VLAN_STRIPPING (28)"
	if (opcode == 29) 
		return "VIRTCHNL_OP_REQUEST_QUEUES (29)"
	if (opcode == 30) 
		return "VIRTCHNL_OP_ENABLE_CHANNELS (30)"
	if (opcode == 31) 
		return "VIRTCHNL_OP_DISABLE_CHANNELS (31)"
	if (opcode == 32) 
		return "VIRTCHNL_OP_ADD_CLOUD_FILTER (32)"
	if (opcode == 33) 
		return "VIRTCHNL_OP_DEL_CLOUD_FILTER (33)"
	return "unknown opcode"
}

probe begin
{
    log("begin probe")
}

//probe module("i40e").function("i40e_vc_notify_vf_link_state")
probe module("i40e").statement("*@i40e_virtchnl_pf.c:57")
{
    // we cannot probe this here
    //printvf("i40e_virtchnl_pf.c:57", $vf)
    // instead, let's jump through a few extra hoops to get exactly the same
    vfid = $abs_vf_id - $hw->func_caps->vf_base_id
    printvfarrayelement("i40e_virtchnl_pf.c:57", $pf->vf, vfid)
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:61")
{
	customprint("i40e_virtchnl_pf.c:61", "--> Inside new logic, VF DOWN")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:64")
{
	customprint("i40e_virtchnl_pf.c:64", "--> Inside old logic, vf->link_up FORCED")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:69")
{
	customprint("i40e_virtchnl_pf.c:69", "--> Inside old logic, pfe.event_data.link_event.link_status = ...")
}

probe module("i40e").statement("*@i40e_virtchnl_pf.c:2336")
{
        customprint("i40e_virtchnl_pf.c:2336", "The VF called method to enable queues")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:2376")
{
	customprint("i40e_virtchnl_pf.c:2376", "new logic - vf->queues_enabled = true")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:2376")
{
	customprint("i40e_virtchnl_pf.c:2376", "new logic - vf->queues_enabled = true")
}
probe module("i40e").statement("*@i40e_virtchnl_pf.c:2400")
{
	customprint("i40e_virtchnl_pf.c:2400", "new logic - vf->queues_enabled = false")
}

probe module("i40e").statement("*@i40e_virtchnl_pf.c:3766") {
    printf("%d |i40e_vc_process_vf_msg:3766| $v_opcode: %s\n", timestamp(), translate_opcode($v_opcode))
}

probe end
{
    log("end probe")
}
~~~

~~~
stap i40e.stap
~~~

Comment 38 Andreas Karis 2020-12-17 19:10:27 UTC
Let's look at:
/root/dpdk-stable-19.08.2/examples/kni/main.c  (http://git.dpdk.org/dpdk-stable/tree/examples/kni/main.c?h=19.08)

The KNI example application has a function monitor_all_ports_link_status which monitors link state every 500 milliseconds:
http://git.dpdk.org/dpdk-stable/tree/examples/kni/main.c?h=19.08#n711
~~~
/*
 * Monitor the link status of all ports and update the
 * corresponding KNI interface(s)
 */
static void *
monitor_all_ports_link_status(void *arg)
{
	uint16_t portid;
	struct rte_eth_link link;
	unsigned int i;
	struct kni_port_params **p = kni_port_params_array;
	int prev;
	(void) arg;

	while (monitor_links) {
		rte_delay_ms(500);
		RTE_ETH_FOREACH_DEV(portid) {
			if ((ports_mask & (1 << portid)) == 0)
				continue;
			memset(&link, 0, sizeof(link));
			rte_eth_link_get_nowait(portid, &link);
			for (i = 0; i < p[portid]->nb_kni; i++) {
				prev = rte_kni_update_link(p[portid]->kni[i],
						link.link_status);
				log_link_state(p[portid]->kni[i], prev, &link);
			}
		}
	}
	return NULL;
}
~~~

It updates the KNI kernel interface status (the up/down that we see from the kernel space side inside the instance) whenever the physical link is up/down.

The KNI example application properly catches a link down:
~~~
# switch
S4048-ON-sw(conf)#int te1/42
S4048-ON-sw(conf-if-te-1/42)#shutdown

~~~

~~~
# instance output
[root@sriov-vm-4b7b253e ~]# ip a ls dev vEth0_0
8: vEth0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:31:b8:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.156/24 scope global vEth0_0
       valid_lft forever preferred_lft forever
[root@sriov-vm-4b7b253e ~]# ip a ls dev vEth0_0
8: vEth0_0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether fa:16:3e:31:b8:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.156/24 scope global vEth0_0
       valid_lft forever preferred_lft forever
~~~

~~~
# hypervisor output
begin probe
=========
1608220195108495 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220195108501 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 0
1608220195108502 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220195108504 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220195108505 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220195108506 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220195108515 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220195108573 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220195108575 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 1
1608220195108576 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220195108577 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220195108578 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220195108579 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220195108581 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220195108637 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220195108638 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 2
1608220195108639 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220195108640 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220195108641 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220195108642 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220195108644 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220195108699 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220195108701 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 3
1608220195108702 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220195108703 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220195108704 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220195108705 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220195108706 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220195108762 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220195108764 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 4
1608220195108765 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220195108766 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220195108767 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220195108767 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220195108769 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
~~~

And properly catches a link up:
~~~
# switch
S4048-ON-sw(conf-if-te-1/42)#no shut
S4048-ON-sw(conf-if-te-1/42)#
~~~

~~~
# instance output
[root@sriov-vm-4b7b253e ~]# ip link ls dev vEth0_0
8: vEth0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:31:b8:8e brd ff:ff:ff:ff:ff:ff
~~~

~~~
# hypervisor output
=========
1608220313319664 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220313319669 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 0
1608220313319671 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220313319672 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220313319673 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220313319674 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220313319679 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220313319739 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220313319741 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 1
1608220313319743 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220313319744 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220313319745 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220313319745 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220313319747 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220313319803 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220313319805 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 2
1608220313319806 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220313319807 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220313319808 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220313319809 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220313319810 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220313319866 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220313319867 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 3
1608220313319868 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220313319869 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220313319870 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220313319871 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220313319873 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608220313319928 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608220313319930 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 4
1608220313319931 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608220313319932 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608220313319932 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608220313319933 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608220313319935 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
~~~

========================================================================================

Important observation:
----------------------------

The difference here is that upon link down we never see the queues_disabled part. E.g., we never see:
~~~
1608220453899951 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
~~~

In order to trigger `new logic - vf->queues_enabled = false`, I interestingly *must* run `ip link set dev vEth0_0 down` explicitly (which will instruct the KNI DPDK code to shut down the interface and send VIRTCHNL_OP_DISABLE_QUEUES to the physical function).

In the above steps, I do not manually trigger an admin down of the interface, and I do not see the vf->queues_enabled = false ...

That means that if we see this only for the vendor application in the customer's environment, it means that the vendor application upon link down explicitly disables the VF queues:
~~~
2384 /**
2385  * i40e_vc_disable_queues_msg
2386  * @vf: pointer to the VF info
2387  * @msg: pointer to the msg buffer
2388  *
2389  * called from the VF to disable all or specific
2390  * queue(s)
2391  **/
2392 static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg)
2393 {
2394         struct virtchnl_queue_select *vqs =
2395             (struct virtchnl_queue_select *)msg;
2396         struct i40e_pf *pf = vf->pf;
2397         i40e_status aq_ret = 0;
2398 
2399         /* Immediately mark queues as disabled */
2400         vf->queues_enabled = false;
~~~

Whereas the DPDK application does not do this, and the kernel space RHEL driver does not do this, either.

This is from the test with the vendor application, we see the explicit call to VIRTCHNL_OP_DISABLE_QUEUES:
~~~
1608204991712032 Thu Dec 17 11:36:31 2020 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
1608204996532259 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608204996532264 Thu Dec 17 11:36:36 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204996532269 Thu Dec 17 11:36:36 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204996542577 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608204996542579 Thu Dec 17 11:36:36 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204996542582 Thu Dec 17 11:36:36 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204996593258 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
1608204996643433 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
1608204997600109 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608204997600114 Thu Dec 17 11:36:37 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204997600119 Thu Dec 17 11:36:37 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204997650430 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608204997650434 Thu Dec 17 11:36:37 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204997650438 Thu Dec 17 11:36:37 2020 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608204997700875 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
1608204997751035 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
~~~

At time of test, we had 3 instances on the hypervisor, 2 vendor instances, 1 RHEL instance. We see the VIRTCHNL_OP_DISABLE_QUEUES 4 times. That can only be divided by the 2 vendor instances. This means that the RHEL kernel instance does not disable its queues when the line goes down. Neither does the KNI example application. Only the vendor application explicitly disables the queues upon a physical line down.

The DISABLE_QUEUES op is sent to the PF in (for iavf):
https://github.com/DPDK/dpdk/blob/f84d733cef13d15ad178535c5cb931851192bab0/drivers/net/iavf/iavf_vchnl.c#L553
~~~
int
iavf_disable_queues(struct iavf_adapter *adapter)
{
	struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter);
	struct virtchnl_queue_select queue_select;
	struct iavf_cmd_info args;
	int err;

	memset(&queue_select, 0, sizeof(queue_select));
	queue_select.vsi_id = vf->vsi_res->vsi_id;

	queue_select.rx_queues = BIT(adapter->eth_dev->data->nb_rx_queues) - 1;
	queue_select.tx_queues = BIT(adapter->eth_dev->data->nb_tx_queues) - 1;

	args.ops = VIRTCHNL_OP_DISABLE_QUEUES;
	args.in_args = (u8 *)&queue_select;
	args.in_args_size = sizeof(queue_select);
	args.out_buffer = vf->aq_resp;
	args.out_size = IAVF_AQ_BUF_SZ;
	err = iavf_execute_vf_cmd(adapter, &args);
	if (err) {
		PMD_DRV_LOG(ERR,
			    "Failed to execute command of OP_DISABLE_QUEUES");
		return err;
	}
	return 0;
}
~~~

https://github.com/DPDK/dpdk/blob/f84d733cef13d15ad178535c5cb931851192bab0/drivers/net/iavf/iavf_rxtx.c#L887
~~~
void
iavf_stop_queues(struct rte_eth_dev *dev)
{
	struct iavf_adapter *adapter =
		IAVF_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
	struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
	struct iavf_rx_queue *rxq;
	struct iavf_tx_queue *txq;
	int ret, i;

	/* Stop All queues */
	if (!vf->lv_enabled) {
		ret = iavf_disable_queues(adapter);
		if (ret)
			PMD_DRV_LOG(WARNING, "Fail to stop queues");
	} else {
		ret = iavf_disable_queues_lv(adapter);
		if (ret)
			PMD_DRV_LOG(WARNING, "Fail to stop queues for large VF");
	}
(...)
~~~


And for i40e_vf: https://github.com/DPDK/dpdk/blob/e8cff6142a2768983bb7950e5f2b0cc00dd59f33/drivers/net/i40e/i40e_ethdev_vf.c#L741:
~~~
static int
i40evf_switch_queue(struct rte_eth_dev *dev, bool isrx, uint16_t qid,
				bool on)
{
	struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
	struct virtchnl_queue_select queue_select;
	int err;
	struct vf_cmd_info args;
	memset(&queue_select, 0, sizeof(queue_select));
	queue_select.vsi_id = vf->vsi_res->vsi_id;

	if (isrx)
		queue_select.rx_queues |= 1 << qid;
	else
		queue_select.tx_queues |= 1 << qid;

	if (on)
		args.ops = VIRTCHNL_OP_ENABLE_QUEUES;
	else
		args.ops = VIRTCHNL_OP_DISABLE_QUEUES;
	args.in_args = (u8 *)&queue_select;
	args.in_args_size = sizeof(queue_select);
	args.out_buffer = vf->aq_resp;
	args.out_size = I40E_AQ_BUF_SZ;
	err = i40evf_execute_vf_cmd(dev, &args);
	if (err)
		PMD_DRV_LOG(ERR, "fail to switch %s %u %s",
			    isrx ? "RX" : "TX", qid, on ? "on" : "off");

	return err;
}
~~~

https://github.com/DPDK/dpdk/blob/e8cff6142a2768983bb7950e5f2b0cc00dd59f33/drivers/net/i40e/i40e_ethdev_vf.c#L1790
~~~
static int
i40evf_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id)
{
	struct i40e_rx_queue *rxq;
	int err;

	rxq = dev->data->rx_queues[rx_queue_id];

	err = i40evf_switch_queue(dev, TRUE, rx_queue_id, FALSE);
	if (err) {
		PMD_DRV_LOG(ERR, "Failed to switch RX queue %u off",
			    rx_queue_id);
		return err;
	}

	i40e_rx_queue_release_mbufs(rxq);
	i40e_reset_rx_queue(rxq);
	dev->data->rx_queue_state[rx_queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;

	return 0;
}
~~~

Either DPDK VF driver, iavf or i40e_vf, leads me to the conclusion that the vendor application closes the VF:
i40evf_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id) -> i40evf_dev_stop(struct rte_eth_dev *dev) -> i40evf_dev_close(struct rte_eth_dev *dev) ->  i40evf_uninit_vf(struct rte_eth_dev *dev) -> i40evf_dev_uninit(struct rte_eth_dev *eth_dev) -> eth_i40evf_pci_remove(struct rte_pci_device *pci_dev) ->
~~~
/*
 * virtual function driver struct
 */
static struct rte_pci_driver rte_i40evf_pmd = {
	.id_table = pci_id_i40evf_map,
	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
	.probe = eth_i40evf_pci_probe,
	.remove = eth_i40evf_pci_remove,
};
~~~

========================================================================================

With that knowledge, I can now reproduce the vendor application's behavior by hacking the KNI example application:
https://github.com/andreaskaris/kni-dpdk-bug-reproducer/compare/kni-default-working...kni-modified-reproducer

Here's my assumption:
When the vendor application catches a line down, it calls something equivalent to DPDK's rte_eth_dev_stop(port_id);
When the vendor application catches a line up, it calls something equivalent to DPDK's rte_eth_dev_start(port_id);
The vendor application does not set up the interface's queues again in between both steps, or does not do this correctly.

In https://github.com/andreaskaris/kni-dpdk-bug-reproducer/compare/kni-default-working...kni-modified-reproducer , I reproduce this by explicitly calling these 2 functions.

rte_eth_dev_stop will send VIRTCHNL_OP_DISABLE_QUEUES to the PF driver. Since the aforementioned Intel commit b144f013fc16a06d7a4b9a4be668a3583fafeda2, this will immediately disable queues:
./drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
~~~
2399         /* Immediately mark queues as disabled */
2400         vf->queues_enabled = false;
~~~

That is actually what should be expected, anyway. Prior to the aforementioned Intel commit, this simply did not happen (even though I personally think it should).

========================================================================================

Now, on a hypervisor with a kernel before the aforementioned commit (3.10.0-1062.18.1.el7.x86_64), and with the modified KNI application code inside the instance:
https://github.com/andreaskaris/kni-dpdk-bug-reproducer/blob/kni-modified-reproducer/main.c

Inside the instance, I compile with:
~~~
cd dpdk-stable-19.08.2/examples/kni/
vim main.c
export RTE_SDK=/root/dpdk-stable-19.08.2; export RTE_TARGET=build; make
~~~

~~~
[root@sriov-vm-2b826661 ~]# ./runkni.sh 
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 8086:154c net_i40e_vf
EAL:   using IOMMU type 8 (No-IOMMU)
APP: Initialising port 0 ...

Checking link status
done
Port0 Link Up - speed 10000Mbps - full-duplex
APP: ========================
APP: KNI Running
APP: kill -SIGUSR1 3278
APP:     Show KNI Statistics.
APP: kill -SIGUSR2 3278
APP:     Zero KNI Statistics.
APP: ========================
APP: Lcore 5 is reading from port 0
APP: Lcore 6 is writing to port 0
APP: Lcore 7 has nothing to do
APP: Lcore 4 has nothing to do
APP: Configure network interface of 0 up
i40evf_add_del_all_mac_addr(): fail to execute command OP_DEL_ETHER_ADDRESS
APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)
Device with port_id=0 already started

Broadcast message from root@sriov-vm-2b826661 (pts/1) (Thu Dec 17 13:20:13 2020):

shutting down link
APP: vEth0_0 NIC Link is Down.
APP: ###### REPRODUCER ######: Running rte_eth_dev_stop(0)
i40evf_add_del_all_mac_addr(): fail to execute command OP_DEL_ETHER_ADDRESS


Broadcast message from root@sriov-vm-2b826661 (pts/1) (Thu Dec 17 13:20:48 2020):

unshut link
APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)

~~~

~~~
[root@sriov-vm-2b826661 ~]# ip a a 192.168.1.152/24 dev vEth0_0
[root@sriov-vm-2b826661 ~]# wall shutting down link
[root@sriov-vm-2b826661 ~]# 
Broadcast message from root@sriov-vm-2b826661 (pts/1) (Thu Dec 17 13:20:13 2020):

shutting down link

[root@sriov-vm-2b826661 ~]# ip a ls dev vEth0_0
8: vEth0_0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether fa:16:3e:8c:a6:91 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.152/24 scope global vEth0_0
       valid_lft forever preferred_lft forever
[root@sriov-vm-2b826661 ~]# wall unshut link

Broadcast message from root@sriov-vm-2b826661 (pts/1) (Thu Dec 17 13:20:48 2020):

unshut link
[root@sriov-vm-2b826661 ~]# ip a ls dev vEth0_0
8: vEth0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:8c:a6:91 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.152/24 scope global vEth0_0
       valid_lft forever preferred_lft forever
~~~

Ping from the other instance to our instance where we shut down the physical line:
~~~
[root@sriov-vm-4b7b253e ~]# ping 192.168.1.152
PING 192.168.1.152 (192.168.1.152) 56(84) bytes of data.
64 bytes from 192.168.1.152: icmp_seq=1 ttl=64 time=1.24 ms
64 bytes from 192.168.1.152: icmp_seq=2 ttl=64 time=0.841 ms
64 bytes from 192.168.1.152: icmp_seq=3 ttl=64 time=0.467 ms
64 bytes from 192.168.1.152: icmp_seq=4 ttl=64 time=0.487 ms
64 bytes from 192.168.1.152: icmp_seq=5 ttl=64 time=0.490 ms
64 bytes from 192.168.1.152: icmp_seq=6 ttl=64 time=0.505 ms
64 bytes from 192.168.1.152: icmp_seq=7 ttl=64 time=0.507 ms
64 bytes from 192.168.1.152: icmp_seq=8 ttl=64 time=0.498 ms
64 bytes from 192.168.1.152: icmp_seq=9 ttl=64 time=0.512 ms
64 bytes from 192.168.1.152: icmp_seq=10 ttl=64 time=0.498 ms
64 bytes from 192.168.1.152: icmp_seq=11 ttl=64 time=0.516 ms
64 bytes from 192.168.1.152: icmp_seq=12 ttl=64 time=0.467 ms
64 bytes from 192.168.1.152: icmp_seq=13 ttl=64 time=0.483 ms
64 bytes from 192.168.1.152: icmp_seq=14 ttl=64 time=0.481 ms
From 192.168.1.156 icmp_seq=31 Destination Host Unreachable
From 192.168.1.156 icmp_seq=32 Destination Host Unreachable
From 192.168.1.156 icmp_seq=33 Destination Host Unreachable
From 192.168.1.156 icmp_seq=34 Destination Host Unreachable
From 192.168.1.156 icmp_seq=35 Destination Host Unreachable
From 192.168.1.156 icmp_seq=36 Destination Host Unreachable
From 192.168.1.156 icmp_seq=37 Destination Host Unreachable
From 192.168.1.156 icmp_seq=38 Destination Host Unreachable
From 192.168.1.156 icmp_seq=39 Destination Host Unreachable
From 192.168.1.156 icmp_seq=40 Destination Host Unreachable
From 192.168.1.156 icmp_seq=41 Destination Host Unreachable
From 192.168.1.156 icmp_seq=42 Destination Host Unreachable
From 192.168.1.156 icmp_seq=43 Destination Host Unreachable
From 192.168.1.156 icmp_seq=44 Destination Host Unreachable
From 192.168.1.156 icmp_seq=45 Destination Host Unreachable
From 192.168.1.156 icmp_seq=46 Destination Host Unreachable
From 192.168.1.156 icmp_seq=47 Destination Host Unreachable
From 192.168.1.156 icmp_seq=48 Destination Host Unreachable
From 192.168.1.156 icmp_seq=49 Destination Host Unreachable
From 192.168.1.156 icmp_seq=50 Destination Host Unreachable
64 bytes from 192.168.1.152: icmp_seq=51 ttl=64 time=0.440 ms
64 bytes from 192.168.1.152: icmp_seq=52 ttl=64 time=0.419 ms
64 bytes from 192.168.1.152: icmp_seq=53 ttl=64 time=0.452 ms
^C
--- 192.168.1.152 ping statistics ---
53 packets transmitted, 17 received, +20 errors, 67% packet loss, time 52010ms
rtt min/avg/max/mdev = 0.419/0.547/1.246/0.197 ms, pipe 4
[root@sriov-vm-4b7b253e ~]# 
~~~

Here's what I ran on the switch:
~~~
S4048-ON-sw(conf-if-te-1/42)#int te1/38
S4048-ON-sw(conf-if-te-1/38)#shut
S4048-ON-sw(conf-if-te-1/38)#no shut
~~~

And as you can see, this works on an older kernel.

========================================================================================

Now, we run the same test on a hypervisor with a later kernel:
~~~
[root@overcloud-computesriov-0 ~]# uname -r
3.10.0-1160.6.1.el7.x86_64
~~~

~~~
[root@sriov-vm-4b7b253e ~]# cd dpdk-stable-19.08.2/examples/kni/
[root@sriov-vm-4b7b253e kni]# vim main.c
[root@sriov-vm-4b7b253e kni]# export RTE_SDK=/root/dpdk-stable-19.08.2; export RTE_TARGET=build; make
[root@sriov-vm-4b7b253e kni]# cd /root
~~~

On the switch, I execute:
~~~
S4048-ON-sw(conf-if-te-1/38)#int te1/42
S4048-ON-sw(conf-if-te-1/42)#shut
S4048-ON-sw(conf-if-te-1/42)#no shut
S4048-ON-sw(conf-if-te-1/42)#
~~~

On the other instance, I run the same ping - note that it fails and never comes back:
~~~
oot@sriov-vm-2b826661 ~]# ping 192.168.1.156
PING 192.168.1.156 (192.168.1.156) 56(84) bytes of data.
64 bytes from 192.168.1.156: icmp_seq=1 ttl=64 time=1.06 ms
64 bytes from 192.168.1.156: icmp_seq=2 ttl=64 time=0.793 ms
64 bytes from 192.168.1.156: icmp_seq=3 ttl=64 time=0.858 ms
64 bytes from 192.168.1.156: icmp_seq=4 ttl=64 time=0.285 ms
64 bytes from 192.168.1.156: icmp_seq=5 ttl=64 time=0.554 ms
64 bytes from 192.168.1.156: icmp_seq=6 ttl=64 time=0.287 ms
64 bytes from 192.168.1.156: icmp_seq=7 ttl=64 time=0.327 ms
64 bytes from 192.168.1.156: icmp_seq=8 ttl=64 time=0.281 ms
64 bytes from 192.168.1.156: icmp_seq=9 ttl=64 time=0.544 ms
64 bytes from 192.168.1.156: icmp_seq=10 ttl=64 time=0.444 ms
From 192.168.1.152 icmp_seq=52 Destination Host Unreachable
From 192.168.1.152 icmp_seq=53 Destination Host Unreachable
From 192.168.1.152 icmp_seq=54 Destination Host Unreachable
From 192.168.1.152 icmp_seq=55 Destination Host Unreachable
From 192.168.1.152 icmp_seq=56 Destination Host Unreachable
From 192.168.1.152 icmp_seq=57 Destination Host Unreachable
From 192.168.1.152 icmp_seq=58 Destination Host Unreachable
From 192.168.1.152 icmp_seq=59 Destination Host Unreachable
From 192.168.1.152 icmp_seq=60 Destination Host Unreachable
From 192.168.1.152 icmp_seq=61 Destination Host Unreachable
From 192.168.1.152 icmp_seq=62 Destination Host Unreachable
From 192.168.1.152 icmp_seq=63 Destination Host Unreachable
From 192.168.1.152 icmp_seq=64 Destination Host Unreachable
From 192.168.1.152 icmp_seq=65 Destination Host Unreachable
From 192.168.1.152 icmp_seq=66 Destination Host Unreachable
From 192.168.1.152 icmp_seq=67 Destination Host Unreachable
From 192.168.1.152 icmp_seq=68 Destination Host Unreachable
From 192.168.1.152 icmp_seq=69 Destination Host Unreachable
From 192.168.1.152 icmp_seq=70 Destination Host Unreachable
From 192.168.1.152 icmp_seq=71 Destination Host Unreachable
From 192.168.1.152 icmp_seq=72 Destination Host Unreachable
From 192.168.1.152 icmp_seq=73 Destination Host Unreachable
From 192.168.1.152 icmp_seq=74 Destination Host Unreachable
From 192.168.1.152 icmp_seq=75 Destination Host Unreachable
From 192.168.1.152 icmp_seq=76 Destination Host Unreachable
From 192.168.1.152 icmp_seq=77 Destination Host Unreachable
From 192.168.1.152 icmp_seq=78 Destination Host Unreachable
From 192.168.1.152 icmp_seq=79 Destination Host Unreachable
^C
--- 192.168.1.156 ping statistics ---
81 packets transmitted, 10 received, +28 errors, 87% packet loss, time 80060ms
rtt min/avg/max/mdev = 0.281/0.543/1.063/0.263 ms, pipe 4
[root@sriov-vm-2b826661 ~]# 
~~~

~~~
[root@sriov-vm-4b7b253e ~]# ./runkni.sh 
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 8086:154c net_i40e_vf
EAL:   using IOMMU type 8 (No-IOMMU)
APP: Initialising port 0 ...

Checking link status
done
Port0 Link Up - speed 10000Mbps - full-duplex
APP: ========================
APP: KNI Running
APP: kill -SIGUSR1 3718
APP:     Show KNI Statistics.
APP: kill -SIGUSR2 3718
APP:     Zero KNI Statistics.
APP: ========================
APP: Lcore 5 is reading from port 0
APP: Lcore 6 is writing to port 0
APP: Lcore 4 has nothing to do
APP: Lcore 7 has nothing to do
APP: Configure network interface of 0 up
i40evf_add_del_all_mac_addr(): fail to execute command OP_DEL_ETHER_ADDRESS
APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)
Device with port_id=0 already started

Broadcast message from root@sriov-vm-4b7b253e (pts/3) (Thu Dec 17 13:26:12 2020):

shutting down link
APP: vEth0_0 NIC Link is Down.
APP: ###### REPRODUCER ######: Running rte_eth_dev_stop(0)
i40evf_add_del_all_mac_addr(): fail to execute command OP_DEL_ETHER_ADDRESS

Broadcast message from root@sriov-vm-4b7b253e (pts/3) (Thu Dec 17 13:26:28 2020):

no shutdown link
~~~

~~~
=========
1608229757859994 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229757859999 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 0
1608229757860001 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229757860002 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229757860003 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229757860005 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229757860010 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229757860066 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229757860068 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 1
1608229757860069 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229757860070 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229757860071 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229757860072 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229757860073 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229757860129 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229757860130 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 2
1608229757860131 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229757860132 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229757860133 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229757860134 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229757860136 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229757860191 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229757860193 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 3
1608229757860194 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229757860194 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229757860195 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229757860196 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229757860198 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229757860253 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229757860255 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 4
1608229757860256 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229757860257 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229757860257 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229757860258 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229757860260 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
1608229758310352 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608229758310358 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608229758310361 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608229758320356 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DISABLE_QUEUES (9)
1608229758320358 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608229758320359 |i40e_virtchnl_pf.c:2400| new logic - vf->queues_enabled = false
1608229758370370 |i40e_vc_process_vf_msg:3766| $v_opcode: VIRTCHNL_OP_DEL_ETH_ADDR (11)
=========
1608229765398162 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229765398167 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 0
1608229765398168 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229765398170 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229765398171 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229765398173 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229765398178 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229765398238 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229765398240 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 1
1608229765398241 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229765398242 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229765398243 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229765398244 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229765398246 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229765398301 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229765398303 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 2
1608229765398304 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229765398305 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229765398305 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229765398306 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229765398308 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229765398364 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229765398365 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 3
1608229765398366 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229765398367 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229765398368 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229765398369 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 1
=========
1608229765398370 |i40e_virtchnl_pf.c:69| --> Inside old logic, pfe.event_data.link_event.link_status = ...
=========
1608229765398426 |i40e_virtchnl_pf.c:57| $vfarr[i]->pf->pdev: bus PCI Bus 0000:18, slot 0, func 0
1608229765398427 |i40e_virtchnl_pf.c:57| $vfarr[i]->vf_id: 4
1608229765398428 |i40e_virtchnl_pf.c:57| $vfarr[i]->trusted: 0
1608229765398429 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_forced: 0
1608229765398430 |i40e_virtchnl_pf.c:57| $vfarr[i]->link_up: 0
1608229765398431 |i40e_virtchnl_pf.c:57| $vfarr[i]->queues_enabled: 0
=========
1608229765398434 |i40e_virtchnl_pf.c:61| --> Inside new logic, VF DOWN
~~~

By the way, in the case of this modified KNI app, it's possible to recover with the following, as this will bring the port through a full init process again:
~~~
[root@sriov-vm-4b7b253e ~]# ip link set dev vEth0_0 down
[root@sriov-vm-4b7b253e ~]# ip link set dev vEth0_0 up
~~~

Comment 39 Andreas Karis 2020-12-17 19:29:56 UTC
As a conclusion, I strongly believe that the issue is due to a combination of 2 factors:

a) The vendor application upon line down explicitly disables the VF queues. This is likely the result of an explicit rte_eth_dev_stop or similar. The vendor application then does not enable the queues again, even though it should, before restarting the device on line up with rte_eth_dev_start. The vendor application could either not stop the device and/or tear down the device queues (the same as the RHEL kernel VF driver, and as the KNI DPDK example application). Or, if the vendor application explicitly sends VIRTCHNL_OP_DISABLE_QUEUES to the kernel PF driver, it must re-enable queues when the line comes back up.
b) With commit b144f013fc16a06d7a4b9a4be668a3583fafeda2, the PF driver's behavior changed. Whereas it tolerated a) before this commit, it does now not tolerate the behavior stated in a) any more.

Comment 40 Andreas Karis 2020-12-21 11:09:16 UTC
Just for confirmation and completeness.

I ran one last test with the Intel OOT drivers, combined with my purposefully misbehaving DPDK KNI test application:

a) curl -O https://downloadmirror.intel.com/29945/eng/i40e-2.12.6.tar.gz
b) curl -O https://downloadmirror.intel.com/24411/eng/i40e-2.13.10.tar.gz

a) Reproduces the issue:

[root@sriov-vm-4b7b253e ~]# ./runkni.sh  &
[1] 1923
[root@sriov-vm-4b7b253e ~]# rmmod: ERROR: Module rte_kni is not currently loaded
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 8086:154c net_i40e_vf
EAL:   using IOMMU type 8 (No-IOMMU)
APP: Initialising port 0 ...

Checking link status
done
Port0 Link Up - speed 10000Mbps - full-duplex
APP: ========================
APP: KNI Running
APP: kill -SIGUSR1 1927
APP:     Show KNI Statistics.
APP: kill -SIGUSR2 1927
APP:     Zero KNI Statistics.
APP: ========================
APP: Lcore 5 is reading from port 0
APP: Lcore 6 is writing to port 0
APP: Lcore 4 has nothing to do
APP: Lcore 7 has nothing to do
APP: Configure network interface of 0 up
APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)
Device with port_id=0 already started

[root@sriov-vm-4b7b253e ~]# ip a a 192.168.1.156/24 dev vEth0_0
[root@sriov-vm-4b7b253e ~]# ping 192.168.1.152
PING 192.168.1.152 (192.168.1.152) 56(84) bytes of data.
64 bytes from 192.168.1.152: icmp_seq=1 ttl=64 time=1.12 ms
64 bytes from 192.168.1.152: icmp_seq=2 ttl=64 time=0.876 ms
(...)
64 bytes from 192.168.1.152: icmp_seq=61 ttl=64 time=0.974 ms
64 bytes from 192.168.1.152: icmp_seq=62 ttl=64 time=0.930 ms
APP: vEth0_0 NIC Link is Down.
APP: ###### REPRODUCER ######: Running rte_eth_dev_stop(0)         # <---- line down


From 192.168.1.156 icmp_seq=63 Destination Host Unreachable
From 192.168.1.156 icmp_seq=64 Destination Host Unreachable
(...)
(...)                                                              # <--- line up
(...)
From 192.168.1.156 icmp_seq=102 Destination Host Unreachable
From 192.168.1.156 icmp_seq=103 Destination Host Unreachable
From 192.168.1.156 icmp_seq=104 Destination Host Unreachable
From 192.168.1.156 icmp_seq=105 Destination Host Unreachable
From 192.168.1.156 icmp_seq=106 Destination Host Unreachable
(...   ...)                                                           
From 192.168.1.156 icmp_seq=144 Destination Host Unreachable 
From 192.168.1.156 icmp_seq=145 Destination Host Unreachable
From 192.168.1.156 icmp_seq=146 Destination Host Unreachable
From 192.168.1.156 icmp_seq=147 Destination Host Unreachable


^C
--- 192.168.1.152 ping statistics ---
148 packets transmitted, 62 received, +48 errors, 58% packet loss, time 147085ms
rtt min/avg/max/mdev = 0.359/0.935/1.121/0.084 ms, pipe 4
[root@sriov-vm-4b7b253e ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:d4:e8:75 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.134/24 brd 192.168.0.255 scope global noprefixroute dynamic eth0
       valid_lft 82057sec preferred_lft 82057sec
    inet6 fe80::f816:3eff:fed4:e875/64 scope link 
       valid_lft forever preferred_lft forever
5: vEth0_0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000    # <-------------- this NO-CARRIER here is interesting
    link/ether fa:16:3e:31:b8:8e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.156/24 scope global vEth0_0
       valid_lft forever preferred_lft forever
[root@sriov-vm-4b7b253e ~]# ip link set dev vEth0_0 down
APP: Configure network interface of 0 down
Device with port_id=0 already stopped
[root@sriov-vm-4b7b253e ~]# ip link set dev vEth0_0 up
APP: Configure network interface of 0 up
Device with port_id=0 already stopped
[root@sriov-vm-4b7b253e ~]# APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)
Device with port_id=0 already started

[root@sriov-vm-4b7b253e ~]# ping 192.168.1.152
PING 192.168.1.152 (192.168.1.152) 56(84) bytes of data.
64 bytes from 192.168.1.152: icmp_seq=1 ttl=64 time=0.726 ms
64 bytes from 192.168.1.152: icmp_seq=2 ttl=64 time=0.983 ms
64 bytes from 192.168.1.152: icmp_seq=3 ttl=64 time=0.833 ms
~~~

b) Does not reproduce the issue:
~~~
[root@sriov-vm-2b826661 ~]# ./runkni.sh  &
[1] 1929
[root@sriov-vm-2b826661 ~]# EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 8086:154c net_i40e_vf
EAL:   using IOMMU type 8 (No-IOMMU)
APP: Initialising port 0 ...

Checking link status
done
Port0 Link Up - speed 10000Mbps - full-duplex
APP: ========================
APP: KNI Running
APP: kill -SIGUSR1 1933
APP:     Show KNI Statistics.
APP: kill -SIGUSR2 1933
APP:     Zero KNI Statistics.
APP: ========================
APP: Lcore 5 is reading from port 0
APP: Lcore 6 is writing to port 0
APP: Lcore 7 has nothing to do
APP: Lcore 4 has nothing to do
APP: Configure network interface of 0 up
APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)
Device with port_id=0 already started

[root@sriov-vm-2b826661 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:c2:4b:aa brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.36/24 brd 192.168.0.255 scope global noprefixroute dynamic eth0
       valid_lft 85873sec preferred_lft 85873sec
    inet6 fe80::f816:3eff:fec2:4baa/64 scope link 
       valid_lft forever preferred_lft forever
5: vEth0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:8c:a6:91 brd ff:ff:ff:ff:ff:ff
[root@sriov-vm-2b826661 ~]# ip a a 192.168.1.152/24 dev vEth0_0
[root@sriov-vm-2b826661 ~]# ping 192.168.1.156
PING 192.168.1.156 (192.168.1.156) 56(84) bytes of data.
64 bytes from 192.168.1.156: icmp_seq=1 ttl=64 time=2.03 ms
(...)
64 bytes from 192.168.1.156: icmp_seq=184 ttl=64 time=1.66 ms
64 bytes from 192.168.1.156: icmp_seq=185 ttl=64 time=0.769 ms
64 bytes from 192.168.1.156: icmp_seq=186 ttl=64 time=1.67 ms
64 bytes from 192.168.1.156: icmp_seq=187 ttl=64 time=0.933 ms
64 bytes from 192.168.1.156: icmp_seq=188 ttl=64 time=1.34 ms
64 bytes from 192.168.1.156: icmp_seq=189 ttl=64 time=0.848 ms
64 bytes from 192.168.1.156: icmp_seq=190 ttl=64 time=1.11 ms
64 bytes from 192.168.1.156: icmp_seq=191 ttl=64 time=1.07 ms
APP: vEth0_0 NIC Link is Down.
APP: ###### REPRODUCER ######: Running rte_eth_dev_stop(0)              # <--- line down
APP: vEth0_0 NIC Link is Up 10000 Mbps (AutoNeg) Full Duplex.    
APP: ###### REPRODUCER ######: Running rte_eth_dev_start(0)             # <--- line up
64 bytes from 192.168.1.156: icmp_seq=205 ttl=64 time=424 ms
64 bytes from 192.168.1.156: icmp_seq=206 ttl=64 time=1.46 ms
64 bytes from 192.168.1.156: icmp_seq=207 ttl=64 time=1.05 ms
64 bytes from 192.168.1.156: icmp_seq=208 ttl=64 time=1.21 ms
64 bytes from 192.168.1.156: icmp_seq=209 ttl=64 time=1.05 ms
^C
--- 192.168.1.156 ping statistics ---
209 packets transmitted, 93 received, +80 errors, 55% packet loss, time 208294ms
rtt min/avg/max/mdev = 0.714/16.530/1003.663/111.799 ms, pipe 4
(reverse-i-search)`ip link ': ^C link set dev vEth0_0 up
[root@sriov-vm-2b826661 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:c2:4b:aa brd ff:ff:ff:ff:ff:ff
5: vEth0_0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether fa:16:3e:8c:a6:91 brd ff:ff:ff:ff:ff:ff
~~~

Comment 41 Andreas Karis 2020-12-21 11:11:50 UTC
[root@overcloud-computesriov-0 ~]# rpm -qa | grep i40e
i40e-2.12.6-1.x86_64

[root@overcloud-computesriov-1 ~]# rpm -qa | grep i40e
i40e-2.13.10-1.x86_64

Comment 42 Andreas Karis 2020-12-21 15:16:05 UTC
Hi,

After talking to the vendor today, here is the actual OOT code which fixes the OOT driver from a diff of 2.12.6-1 and 2.12.6.3-1 source code.

~~~
curl -O https://downloadmirror.intel.com/29945/eng/i40e-2.12.6.tar.gz
tar -xf i40e-2.12.6.tar.gz
~~~

~~~
# download from https://sourceforge.net/projects/e1000/files/unsupported/i40e%20unsupported/i40e-2.12.6.3/
tar -xf i40e-2.12.6.3.tar.gz
~~~

------------------------------------------------------------------

Changelog:
~~~
Changelog for i40e-2.12.6.3 /*the diffrerence from i40e-2.12.6*/
===========================================================================
Fix for link-flapping
~~~

SRPM spec file:
~~~
[root@overcloud-computesriov-0 ~]# head -n4 i40e-2.12.6/i40e.spec
Name: i40e
Summary: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
Version: 2.12.6
Release: 1
[root@overcloud-computesriov-0 ~]# head -n4 i40e-2.12.6.3/i40e.spec
Name: i40e
Summary: Intel(R) 40-10 Gigabit Ethernet Connection Network Driver
Version: 2.12.6.3
Release: 1
~~~

~~~
[root@overcloud-computesriov-0 ~]# diff -ruN i40e-2.12.6/src/ i40e-2.12.6.3/src/
diff -ruN i40e-2.12.6/src/i40e_main.c i40e-2.12.6.3/src/i40e_main.c
--- i40e-2.12.6/src/i40e_main.c	2020-07-01 12:31:56.821199294 +0000
+++ i40e-2.12.6.3/src/i40e_main.c	2020-10-12 18:12:04.300097712 +0000
@@ -39,12 +39,12 @@
 #define DRV_VERSION_LOCAL
 #endif /* DRV_VERSION_LOCAL */
 
-#define DRV_VERSION_DESC ""
+#define DRV_VERSION_DESC ".3"
 
 #define DRV_VERSION_MAJOR 2
 #define DRV_VERSION_MINOR 12
 #define DRV_VERSION_BUILD 6
-#define DRV_VERSION_SUBBUILD 0
+#define DRV_VERSION_SUBBUILD 3
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 	__stringify(DRV_VERSION_MINOR) "." \
 	__stringify(DRV_VERSION_BUILD) \
diff -ruN i40e-2.12.6/src/i40e_virtchnl_pf.c i40e-2.12.6.3/src/i40e_virtchnl_pf.c
--- i40e-2.12.6/src/i40e_virtchnl_pf.c	2020-07-01 12:31:57.169199290 +0000
+++ i40e-2.12.6.3/src/i40e_virtchnl_pf.c	2020-10-12 18:12:04.656097699 +0000
@@ -73,6 +73,35 @@
 }
 
 /**
+ * i40e_set_vf_link_state
+ * @vf: pointer to the VF structure
+ * @pfe: pointer to PF event structure
+ * @ls: pointer to link status structure
+ *
+ * set a link state on a single vf
+ **/
+static void i40e_set_vf_link_state(struct i40e_vf *vf,
+				   struct virtchnl_pf_event *pfe, struct i40e_link_status *ls)
+{
+	u8 link_status = ls->link_info & I40E_AQ_LINK_UP;
+
+#ifdef HAVE_NDO_SET_VF_LINK_STATE
+	if (vf->link_forced)
+		link_status = vf->link_up;
+#endif
+
+	if (vf->driver_caps & VIRTCHNL_VF_CAP_ADV_LINK_SPEED) {
+		pfe->event_data.link_event_adv.link_speed = link_status ?
+			i40e_vc_link_speed2mbps(ls->link_speed) : 0;
+		pfe->event_data.link_event_adv.link_status = link_status;
+	} else {
+		pfe->event_data.link_event.link_speed = link_status ?
+			i40e_virtchnl_link_speed(ls->link_speed) : 0;
+		pfe->event_data.link_event.link_status = link_status;
+	}
+}
+
+/**
  * i40e_vc_notify_vf_link_state
  * @vf: pointer to the VF structure
  *
@@ -89,60 +118,7 @@
 	pfe.event = VIRTCHNL_EVENT_LINK_CHANGE;
 	pfe.severity = PF_EVENT_SEVERITY_INFO;
 
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-	if (vf->driver_caps & VIRTCHNL_VF_CAP_ADV_LINK_SPEED) {
-		/* Always report link is down if the VF queues aren't enabled */
-		if (!vf->queues_enabled) {
-			pfe.event_data.link_event_adv.link_status = false;
-			pfe.event_data.link_event_adv.link_speed = 0;
-#ifdef HAVE_NDO_SET_VF_LINK_STATE
-		} else if (vf->link_forced) {
-			pfe.event_data.link_event_adv.link_status = vf->link_up;
-			pfe.event_data.link_event_adv.link_speed = vf->link_up ?
-				i40e_vc_link_speed2mbps(ls->link_speed) : 0;
-#endif
-		} else {
-			pfe.event_data.link_event_adv.link_status =
-				ls->link_info & I40E_AQ_LINK_UP;
-			pfe.event_data.link_event_adv.link_speed =
-				i40e_vc_link_speed2mbps(ls->link_speed);
-		}
-	} else {
-		/* Always report link is down if the VF queues aren't enabled */
-		if (!vf->queues_enabled) {
-			pfe.event_data.link_event.link_status = false;
-			pfe.event_data.link_event.link_speed = 0;
-#ifdef HAVE_NDO_SET_VF_LINK_STATE
-		} else if (vf->link_forced) {
-			pfe.event_data.link_event.link_status = vf->link_up;
-			pfe.event_data.link_event.link_speed = (vf->link_up ?
-				i40e_virtchnl_link_speed(ls->link_speed) : 0);
-#endif
-		} else {
-			pfe.event_data.link_event.link_status =
-				ls->link_info & I40E_AQ_LINK_UP;
-			pfe.event_data.link_event.link_speed =
-				i40e_virtchnl_link_speed(ls->link_speed);
-		}
-	}
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-	/* Always report link is down if the VF queues aren't enabled */
-	if (!vf->queues_enabled) {
-		pfe.event_data.link_event.link_status = false;
-		pfe.event_data.link_event.link_speed = 0;
-#ifdef HAVE_NDO_SET_VF_LINK_STATE
-	} else if (vf->link_forced) {
-		pfe.event_data.link_event.link_status = vf->link_up;
-		pfe.event_data.link_event.link_speed = (vf->link_up ?
-			i40e_virtchnl_link_speed(ls->link_speed) : 0);
-#endif
-	} else {
-		pfe.event_data.link_event.link_status =
-			ls->link_info & I40E_AQ_LINK_UP;
-		pfe.event_data.link_event.link_speed =
-			i40e_virtchnl_link_speed(ls->link_speed);
-	}
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+	i40e_set_vf_link_state(vf, &pfe, ls);
 
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, VIRTCHNL_OP_EVENT,
 			       I40E_SUCCESS, (u8 *)&pfe, sizeof(pfe), NULL);
@@ -991,62 +967,22 @@
 	switch (link) {
 	case VFD_LINKSTATE_AUTO:
 		vf->link_forced = false;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status =
-			ls->link_info & I40E_AQ_LINK_UP;
-		pfe.event_data.link_event_adv.link_speed =
-			i40e_vc_link_speed2mbps(ls->link_speed);
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status =
-			ls->link_info & I40E_AQ_LINK_UP;
-		pfe.event_data.link_event.link_speed =
-			i40e_virtchnl_link_speed(ls->link_speed);
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+		i40e_set_vf_link_state(vf, &pfe, ls);
 		break;
 	case VFD_LINKSTATE_ON:
 		vf->link_forced = true;
 		vf->link_up = true;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status = true;
-		pfe.event_data.link_event_adv.link_speed =
-			i40e_vc_link_speed2mbps(ls->link_speed);
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status = true;
-		pfe.event_data.link_event.link_speed =
-			i40e_virtchnl_link_speed(ls->link_speed);
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+		i40e_set_vf_link_state(vf, &pfe, ls);
 		break;
 	case VFD_LINKSTATE_OFF:
 		vf->link_forced = true;
 		vf->link_up = false;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status = false;
-		pfe.event_data.link_event_adv.link_speed = 0;
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status = false;
-		pfe.event_data.link_event.link_speed = 0;
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+		i40e_set_vf_link_state(vf, &pfe, ls);
 		break;
 	default:
 		ret = -EINVAL;
 		goto error_out;
 	}
-	/* Do not allow change link state when VF is disabled
-	 * Check if requested link state is not VFD_LINKSTATE_OFF, to prevent
-	 * false positive warning in case of reloading the driver
-	 */
-	if (vf->pf_ctrl_disable && link != VFD_LINKSTATE_OFF) {
-		vf->link_up = false;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status = false;
-		pfe.event_data.link_event_adv.link_speed = 0;
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status = false;
-		pfe.event_data.link_event.link_speed = 0;
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		dev_warn(&pf->pdev->dev,
-			 "Not possible to change VF link state, please enable it first\n");
-	}
 
 	/* Notify the VF of its new link state */
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, VIRTCHNL_OP_EVENT,
@@ -3330,8 +3266,6 @@
 		}
 	}
 
-	vf->queues_enabled = true;
-
 error_param:
 	/* send the response to the VF */
 	return i40e_vc_send_resp_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES,
@@ -3353,9 +3287,6 @@
 	struct i40e_pf *pf = vf->pf;
 	i40e_status aq_ret = 0;
 
-	/* Immediately mark queues as disabled */
-	vf->queues_enabled = false;
-
 	if (!i40e_sync_vf_state(vf, I40E_VF_STATE_ACTIVE)) {
 		aq_ret = I40E_ERR_PARAM;
 		goto error_param;
@@ -5463,62 +5394,22 @@
 	switch (link) {
 	case IFLA_VF_LINK_STATE_AUTO:
 		vf->link_forced = false;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status =
-			ls->link_info & I40E_AQ_LINK_UP;
-		pfe.event_data.link_event_adv.link_speed =
-			i40e_vc_link_speed2mbps(ls->link_speed);
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status =
-			ls->link_info & I40E_AQ_LINK_UP;
-		pfe.event_data.link_event.link_speed =
-			i40e_virtchnl_link_speed(ls->link_speed);
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+		i40e_set_vf_link_state(vf, &pfe, ls);
 		break;
 	case IFLA_VF_LINK_STATE_ENABLE:
 		vf->link_forced = true;
 		vf->link_up = true;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status = true;
-		pfe.event_data.link_event_adv.link_speed =
-			i40e_vc_link_speed2mbps(ls->link_speed);
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status = true;
-		pfe.event_data.link_event.link_speed =
-			i40e_virtchnl_link_speed(ls->link_speed);
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+		i40e_set_vf_link_state(vf, &pfe, ls);
 		break;
 	case IFLA_VF_LINK_STATE_DISABLE:
 		vf->link_forced = true;
 		vf->link_up = false;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status = false;
-		pfe.event_data.link_event_adv.link_speed = 0;
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status = false;
-		pfe.event_data.link_event.link_speed = 0;
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
+		i40e_set_vf_link_state(vf, &pfe, ls);
 		break;
 	default:
 		ret = -EINVAL;
 		goto error_out;
 	}
-	/* Do not allow change link state when VF is disabled
-	 * Check if requested link state is not IFLA_VF_LINK_STATE_DISABLE,
-	 * to prevent false positive warning in case of reloading the driver
-	 */
-	if (vf->pf_ctrl_disable && link != IFLA_VF_LINK_STATE_DISABLE) {
-		vf->link_up = false;
-#ifdef VIRTCHNL_VF_CAP_ADV_LINK_SPEED
-		pfe.event_data.link_event_adv.link_status = false;
-		pfe.event_data.link_event_adv.link_speed = 0;
-#else /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		pfe.event_data.link_event.link_status = false;
-		pfe.event_data.link_event.link_speed = 0;
-#endif /* VIRTCHNL_VF_CAP_ADV_LINK_SPEED */
-		dev_warn(&pf->pdev->dev,
-			 "Not possible to change VF link state, please enable it first\n");
-	}
 
 	/* Notify the VF of its new link state */
 	i40e_aq_send_msg_to_vf(hw, abs_vf_id, VIRTCHNL_OP_EVENT,
@@ -6415,7 +6306,6 @@
 		ret = i40e_ctrl_vf_rx_rings(vsi, q_map, enable);
 		if (ret)
 			goto err_out;
-		vf->queues_enabled = false;
 	} else {
 		/* Do nothing when there is no iavf driver loaded */
 		if (!test_bit(I40E_VF_STATE_LOADED_VF_DRIVER, &vf->vf_states))
@@ -6426,7 +6316,6 @@
 		ret = i40e_ctrl_vf_tx_rings(vsi, q_map, enable);
 		if (ret)
 			goto err_out;
-		vf->queues_enabled = true;
 		vf->pf_ctrl_disable = false;
 		/* reset need to reinit VF resources */
 		i40e_vc_notify_vf_reset(vf);
diff -ruN i40e-2.12.6/src/i40e_virtchnl_pf.h i40e-2.12.6.3/src/i40e_virtchnl_pf.h
--- i40e-2.12.6/src/i40e_virtchnl_pf.h	2020-07-01 12:31:57.237199289 +0000
+++ i40e-2.12.6.3/src/i40e_virtchnl_pf.h	2020-10-12 18:12:04.728097697 +0000
@@ -110,7 +110,6 @@
 	bool link_forced;
 	bool link_up;		/* only valid if VF link is forced */
 #endif
-	bool queues_enabled;	/* true if the VF queues are enabled */
 	bool mac_anti_spoof;
 	u16 num_vlan;
 	DECLARE_BITMAP(mirror_vlans, VLAN_N_VID);
~~~

As we can see, these changes were not included upstream in the in-tree driver by Intel:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c#n59
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c#n59
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c#L59

- Andreas

Comment 43 Andreas Karis 2020-12-21 15:22:26 UTC
And for completeness, I also tested with:

[root@overcloud-computesriov-0 ~]# ethtool -i p6p1
driver: i40e
version: 2.12.6.3
firmware-version: 7.10 0x800051a4 19.0.12
expansion-rom-version: 
bus-info: 0000:18:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

On the hypervisor, and this indeed fixes the issue.

Comment 44 Andreas Karis 2020-12-21 18:29:38 UTC
Stefan,

With https://bugzilla.redhat.com/show_bug.cgi?id=1901064#c42, this looks like Intel pushed a fix in the OOT i40e driver and it seems that Intel might have forgotten to push the fix to the in-tree kernel driver (or purposefully omitted that port). The code in the OOT driver and the in-tree driver looks very similar, hence I assume that the in-tree driver should normally follow fixes in the OOT driver.

Can we look into this further?

Thanks,

Andreas

Comment 45 Stefan Assmann 2020-12-22 07:17:10 UTC
As usual the OOT i40e driver lacks a proper changelog entry.
From looking at the diff Intel removed the queues_enabled toggle from OOT, while it remains present upstream.

Aleksandr, why has this change not been pushed upstream?

Comment 46 John W. Linville 2021-01-04 16:47:55 UTC
Stefan, could you build a test kernel based on the patch from comment 42 and ask for customer validation? I believe that Intel is likely to post an equivalent patch for upstream consumption soon.

Comment 47 Lihong Yang 2021-01-04 17:16:22 UTC
Hi Stefan,
Aleksandr's team is working on sending the missing patch that should solve the issue observed in this patch. It will soon be available at Intel-Wired-LAN for public viewing. I will provide the link here as soon as it is available. 

Thanks,
Lihong

Comment 49 Bertrand 2021-01-05 11:00:31 UTC
INTEL / Mateusz has published the patch to intel-wired at: 

https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210105103550.17075-1-mateusz.palczewski@intel.com/

Comment 50 Stefan Assmann 2021-01-05 14:12:21 UTC
Brew build available with preliminary patch
i40e: Fix for link-flapping (from https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210105103550.17075-1-mateusz.palczewski@intel.com/)
as well as
6ec12e1e9404 i40e: report correct VF link speed when link state is set to enable
which is required to make the the first patch apply cleanly.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34068867

Andreas, please test this build.

Comment 51 Alex Stupnikov 2021-01-12 12:32:04 UTC
Hello

We get a confirmation from customer's side: test kernel works as expected. Please let me know about next steps.

Kind Regards, Alex.

Comment 52 Stefan Assmann 2021-01-12 13:12:04 UTC
Lihong,

afaict the patch is not yet upstream. I'm setting needinfo to you, please make sure the patch gets priority with upstreaming.
Thanks!

Comment 53 Lihong Yang 2021-01-12 17:44:03 UTC
Hi Stefan,
A new version is sent to IWL: https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210112171431.457524-1-arkadiusz.kubalewski@intel.com/, but it did not get your split request implemented since the team did not see you input until they finished on this spin. 

BTW, do you still want the team to split it two patches, the "revert" and "refactor"? Just to be clear, to split, you meant the code to re removed in the patch is considered as "revert", the new code added in the patch can be considered "refactor," is that correct?

Thanks,
Lihong

Comment 54 Stefan Assmann 2021-01-12 21:28:25 UTC
Hi Lihong, by revert I meant "reverting the code/patch that introduced the regression". Everything on top of that is code refactoring and belongs into a separate patch. If the code is already out there now keep it the way it is. Just in case the case the patch needs to be respun again, I'd request to split it up properly.

Comment 55 Lihong Yang 2021-01-12 21:55:07 UTC
Thanks for the clarification, Stefan!

For this already posted "[net,v3] i40e: Fix for link-flapping" at IWL now, if we don't get any request to make more changes in a day or two, I will request validation team to prioritize it for testing, then we can send it to net tree. 

Thanks,
Lihong

Comment 56 Andreas Karis 2021-01-18 09:30:14 UTC
Hi Lihong,

Please let us know as soon as you made progress.

Thanks,

Andreas

Comment 58 Lihong Yang 2021-01-19 16:27:22 UTC
Our i40e team decided to make the changes per Stefan’s request below before sending it to Linux upstream and split the original patch into two [1][2]. They were just resubmitted to IWL for reviewing. If there is no comment for additional changes, our validation should have them tested by the end of the week. 

[1] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210118193454.275037-2-arkadiusz.kubalewski@intel.com/ ([net,v4,2/2] i40e: refactor repeated link state reporting code)
[2] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210118193454.275037-3-arkadiusz.kubalewski@intel.com/ ([net,v4,1/2] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues")

Comment 63 Andreas Karis 2021-01-21 14:40:22 UTC
Hi,

Just to clarify: 

i) https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210118193454.275037-2-arkadiusz.kubalewski@intel.com/ is just a plain rollback of https://bugzilla.redhat.com/show_bug.cgi?id=1901064#c11

According to comments on IWL, the commit message should be improved.

This will fix our issue. And that should be committed to the net.git as it's a bug fix.

ii) The refactor from https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210118193454.275037-3-arkadiusz.kubalewski@intel.com/ is only a refactor, right? It does not fix anything. So we do not need this. Also, judging by the comment on IWL: "As this is refactoring, this is should be for net-next."

So, in the interest of saving time and speeding this up, can we break up the patchset and focus on i), and get this through the known chain IWL -> net.git -> upstream -> RHEL 8 / RHEL 7 downstream as fast as possible?

Testing should be easier that way, too, as we only roll back that commit.

Thanks!

- Andreas

Comment 64 Andreas Karis 2021-01-21 14:43:06 UTC
Sorry, for i) from https://bugzilla.redhat.com/show_bug.cgi?id=1901064#c2

Comment 65 Lihong Yang 2021-01-21 17:26:55 UTC
Hi Andreas,
Yes, I agree with your comments. The revert patch seems to be the key to solving the issue in this case. We requested i40e team to resubmit the revert patch with the commit info needed so that it can be accepted Linux upstream. I haven't seen the submission yet. Let me ping them for a status update.

Comment 66 Andreas Karis 2021-01-22 10:16:20 UTC
Hi,

>> We requested i40e team to resubmit the revert patch with the commit info needed so that it can be accepted Linux upstream.

So let's just focus on the revert. Please keep us posted about any new patches on IWL, on the progress of your internal testing and about when this was committed to the net.git

Intel: Looking at https://patchwork.ozlabs.org/project/intel-wired-lan/list/?state=%2A&archive=both, I do not see the new patch with the updated commit message. When will you update the commit message and post this to IWL, and is the timeline for testing still the same (end of this week = today?)

Stefan: Could we get a test kernel for a last test with the customer just with the revert from https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210118193454.275037-2-arkadiusz.kubalewski@intel.com/

Thanks,

Andreas

Comment 67 Stefan Assmann 2021-01-22 11:56:37 UTC
(In reply to Andreas Karis from comment #66)
> Stefan: Could we get a test kernel for a last test with the customer just
> with the revert from
> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210118193454.
> 275037-2-arkadiusz.kubalewski/

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=34438206

Comment 75 Lihong Yang 2021-01-26 03:52:24 UTC
The respin of the revert patch[1] was on IWL now. The current ETA from our validation is to provide a tested-by tag by the end of this Wednesday, PST, given no rework is required from the community again. 

[1] [net,v5,1/2] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" (https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20210123002223.361450-1-arkadiusz.kubalewski@intel.com/)

Comment 79 Lihong Yang 2021-01-28 23:01:09 UTC
We got the revert patch tested, and sent it to Linux upstream net tree as part of a bug-fix series[1]. I will update when it is accepted there. 
 
[1] [net,4/4] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues" (https://patchwork.kernel.org/project/netdevbpf/patch/20210128213851.2499012-5-anthony.l.nguyen@intel.com/)

Comment 81 Lihong Yang 2021-02-02 17:23:07 UTC
Just a quick update, the revert patch [1] just got accepted on net.

[1] i40e: Revert "i40e: don't report link up for a VF who hasn't enabled queues"
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/drivers/net/ethernet/intel/i40e?id=f559a356043a55bab25a4c00505ea65c50a956fb

Comment 95 Augusto Caringi 2021-02-23 00:34:30 UTC
Patch(es) committed on kernel-3.10.0-1160.21.1.el7

Comment 104 errata-xmlrpc 2021-03-16 13:52:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0856


Note You need to log in before you can comment on or make changes to this bug.