Bug 2026621

Summary: VLAN filtering cannot be configured with Intel X710
Product: Red Hat Enterprise Linux 8 Reporter: Petr Horáček <phoracek>
Component: nisporAssignee: Gris Ge <fge>
Status: CLOSED ERRATA QA Contact: Mingyu Shi <mshi>
Severity: high Docs Contact:
Priority: urgent    
Version: 8.4CC: danken, ferferna, fge, jiji, jishi, klatouch, nashok, network-qe, till, toneata
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 8.6Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: nispor-1.2.3-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2040316 2040317 2075200 (view as bug list) Environment:
Last Closed: 2022-05-10 14:07:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2040316, 2040317, 2075200    
Attachments:
Description Flags
Diff of the failed configuration none

Description Petr Horáček 2021-11-25 10:58:52 UTC
Created attachment 1843563 [details]
Diff of the failed configuration

Created attachment 1843563 [details]
Diff of the failed configuration

Description of problem:
When any VLAN filtering is configured on Intel X710, the configuration fails to get applied, with dmesg containing a number of following messages:
[79627.840049] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on

The odd part is that it happens even with a single VLAN trunk getting open, so it **should** not be a problem with lack of memory.


Version-Release number of selected component (if applicable):
RHEL 8.4
nmstate 1.0.2-14
NetworkManager in container, version 1.30.0-13.el8_4
NetworkManager on the host, version 1.30.0-10.el8_4


How reproducible:
Consistently on some PFs of the NIC. It always happens on the second one but did not happen on the third. While the second one had a wire connected, third one was disconnected.


Steps to Reproduce:
1. Get a host with Intel X710 NIC
2. Apply the following config:
    interfaces:
    - bridge:
        options:
          stp:
            enabled: false
        port:
        - name: eno2
          vlan:
            mode: trunk
            trunk-tags:
            - id: 1000
      ipv4:
        auto-dns: true
        dhcp: false
        enabled: false
      ipv6:
        enabled: false
      name: br1test
      state: up
      type: linux-bridge

Actual results:
Configuration fails, nmstate notices that requested VLANs were not applied. dmesg contains number of messages complaining about I40E_AQ_RC_ENOSPC. The number depends on how many IDs we atttemped to open.


Expected results:
We should be able to apply at least a limited numbed of trunk IDs on hardware that has offloading capability.


Additional info:

This reproduces even if vlan offloading gets disabled through ethtool:
rx-vlan-offload: off
tx-vlan-offload: off

Used NICs:
[root@cnv-qe-infra-32 /]# lspci  | grep 710
19:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
19:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
19:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
19:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)

Comment 2 Fernando F. Mancera 2021-11-25 11:09:19 UTC
Debugging with nmcli and iproute2. I think this is probably a NM or kernel bug.

Comment 3 Petr Horáček 2021-11-25 11:25:07 UTC
After reboot of the host and trying again, the dmesg stopped appearing (is restart needed to clear the memory?). However, it still fails to configure the VLAN trunk.

Comment 6 Gris Ge 2021-12-13 08:51:36 UTC
Hi Petr,

This sounds like a kernel bug to me.
Can I close as duplication of bug 1959512 ?

Comment 7 Petr Horáček 2021-12-13 09:07:59 UTC
It's not clear that these two bugs are the same to me. Despite being in the same area, they are different. The on with Pensando caused the host to freeze. This Intel one just silently ignores the configuration. Marking it as a duplicate would be dangerous as we would skip the investigation and just assume it is already fixed. Could we move the BZ to kernel and let them evaluate it instead?

Comment 16 Gris Ge 2022-01-07 02:56:27 UTC
If nmcli can works, then nmstate should fix it or at lease workaround it.
I will take a look before 14 Jan 2022.

Comment 18 Gris Ge 2022-01-07 03:04:41 UTC
Acceptance criteria: Nmstate should support vlan filtering on intel X710(i40e driver).

Comment 19 Gris Ge 2022-01-07 04:43:23 UTC
Patch posted to https://github.com/nispor/nispor/pull/166

Backport scratch build:

RHEL 8.6: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42249637
RHEL 8.5: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42249613
RHEL 8.4: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42249571


I have reproduce the problem and tested the 8.5 rpm on IBM s390x(ppc64le) server with Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T.

Comment 20 Gris Ge 2022-01-07 04:57:29 UTC
Hi nijin ashok,

Could you use above scratch rpm to test in your environment?

Thank you!

Comment 23 nijin ashok 2022-01-13 02:27:25 UTC
(In reply to Gris Ge from comment #20)
> Hi nijin ashok,
> 
> Could you use above scratch rpm to test in your environment?
> 
> Thank you!

Tested with the new build and nmstatectl was able to apply the VLANs successfully.

~~~
npc iface ens1f0 |grep -A8 vlans
Unhandled AF_SPEC_BRIDGE_INFO: 0 [2, 0]
Unhandled AF_SPEC_BRIDGE_INFO: 1 [1, 0]
Unhandled AF_SPEC_BRIDGE_INFO: 0 [2, 0]
Unhandled AF_SPEC_BRIDGE_INFO: 1 [1, 0]
    vlans:
      - vid: 1
        is_pvid: true
        is_egress_untagged: true
      - vid_range:
          - 100
          - 2412
        is_pvid: false
        is_egress_untagged: false
~~~

Comment 30 Gris Ge 2022-01-20 11:34:26 UTC
*** Bug 2030197 has been marked as a duplicate of this bug. ***

Comment 33 errata-xmlrpc 2022-05-10 14:07:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (nispor bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:1881