Bug 2210164 - Do not disable SR-IOV when activation failed due to SR-IOV parameter failure [NEEDINFO]
Summary: Do not disable SR-IOV when activation failed due to SR-IOV parameter failure
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: NetworkManager
Version: 9.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: NetworkManager Development Team
QA Contact: Matej Berezny
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-26 02:22 UTC by Gris Ge
Modified: 2023-08-16 21:49 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-1.43.11-1.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:
fge: needinfo? (elevin)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNF-7461 0 None None None 2023-06-28 14:44:51 UTC
Red Hat Issue Tracker NMT-581 0 None None None 2023-05-26 02:26:37 UTC
Red Hat Issue Tracker RHELPLAN-158243 0 None None None 2023-05-26 02:26:42 UTC
freedesktop.org Gitlab NetworkManager NetworkManager-ci merge_requests 1470 0 None opened sriov: added sriov_dont_disable_on_acitvation_fail test 2023-08-07 15:13:54 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1682 0 None opened sriov: Do not fail activation on SR-IOV VF failures 2023-06-28 14:27:13 UTC

Description Gris Ge 2023-05-26 02:22:48 UTC
Description of problem:

https://issues.redhat.com/browse/OCPBUGS-14107

When SR-IOV parameter failed to apply(for example, min_tx_rate not supported),
NetworkManager will deactivate the connection which lead to disabling SR-IOV
which remove preexist VFs before activation. This break user's network
connection when pre-exist VFs are used in VLAN/Bond/bridge.

Version-Release number of selected component (if applicable):
NetworkManager-1.43.8-32322.copr.d07383d3f3.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create SR-IOV VFs using nmstate YAML:

```
---
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ethernet:
      sr-iov:
        total-vfs: 2
  - name: eth2
    type: ethernet
    state: up
    ethernet:
      sr-iov:
        total-vfs: 2
```

2. Create VLAN over VF:


```
---
interfaces:
  - name: eth1v1.101
    type: vlan
    state: up
    vlan:
      base-iface: eth1v1
      id: 101
  - name: eth2v1.101
    type: vlan
    state: up
    vlan:
      base-iface: eth2v1
      id: 101
```

3. Assign VLAN to a bond:

```
---
interfaces:
- name: bond0
  type: bond
  state: up
  link-aggregation:
    mode: balance-rr
    port:
    - eth1v1.101
    - eth2v1.101
```

4. Apply invalid SR-IOV configuration:

```
interfaces:
- name: eth1
  type: ethernet
  state: up
  ethernet:
   sr-iov:
     total-vfs: 5
     vfs:
     - id: 2
       max-tx-rate: 200
```

Actual results:

 * The eth1v1 and eth1v1.101 been removed and readded.
 * The bond0 lose eth1v1.101.

Expected results:

 * The eth1v1, eth1v1.101 is untouched due to SR-IOV failures.
 * The bond0 still have eth1v1.101 and eth2v1.101, no detach/reatach.

Additional info:

This is known limitation in NM that it disable SR-IOV if SR-IOV parament
setting failed. Disabling SR-IOV will cause VF been removed from system
which will break the network access.

Comment 4 Gris Ge 2023-06-28 14:25:13 UTC
Patch sent to upstream: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1682

With this patch applied, nmstate will show error as:

NmstateError: VerificationError: Verification failure: enp196s0f0.interface.ethernet.sr-iov.vfs[1].max-tx-rate desire '200', current '0'

Instead of

NmstateError: VerificationError: Verification failure: enp196s0f0.interface.ethernet.sr-iov.total-vfs: desire '2', current '0'

And NetworkManager will not deactivate the SRIOV during `nmstatectl apply` due to SR-IOV VF parameter error.

Comment 5 Gris Ge 2023-06-28 14:26:48 UTC
RHEL 9.2 zstream scratch build could be found at https://people.redhat.com/fge/bz_2210164/

Comment 11 elevin 2023-07-06 22:59:11 UTC
Verification of the custom RPM has partially failed.

Setup:
Server Version: 4.14.0-ec.2
RPM https://people.redhat.com/fge/bz_2210164/ 

Sceanrio:
1) Apply configuration to create  bond interface with interface vlan based on VF. IT takes several min, but finally it created
===  
   interfaces:
      - name: ens1f0
        type: ethernet
        state: up
        ethernet:
          sr-iov:
            total-vfs: 2
      - name: ens1f0v0.481
        type: vlan
        state: up
        vlan:
          base-iface: ens1f0v0
          id: 481
      - name: bond3
        type: bond
        state: up
        link-aggregation:
          mode: balance-rr
          port:
          - ens1f0v0.481
===
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:bf:f2:bc brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 2e:a1:1e:af:13:35 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether ee:52:88:6c:6c:bc brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    altname enp59s0f0
802: bond3: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 2e:a1:1e:af:13:35 brd ff:ff:ff:ff:ff:ff
803: ens1f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 2e:a1:1e:af:13:35 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v0
804: ens1f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether ee:52:88:6c:6c:bc brAppld ff:ff:ff:ff:ff:ff
    altname enp59s0f0v1
806: ens1f0v0.481@ens1f0v0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond3 state UP mode DEFAULT group default qlen 1000
    link/ether 2e:a1:1e:af:13:35 brd ff:ff:ff:ff:ff:ff
===

2) Apply incorect configuration - Intel nic doesn't support min-tx-rate 
===
     interfaces:
     - name: ens1f0
       type: ethernet
       state: up
       ethernet:
         sr-iov:
           total-vfs: 2
           vfs:
           - id: 1
             max-tx-rate: 200
             min-tx-rate: 100
===

Result:
1) Failed to apply the wrong policy as expected
===
 $ oc get nncp
NAME                       STATUS      REASON
all-interface-worker-0     Available   SuccessfullyConfigured
wrong-interface-worker-1   Degraded    FailedToConfigure
===
[2023-07-06T22:37:57Z INFO  nmstate::query_apply::net_state] Rollbacked to checkpoint /org/freedesktop/NetworkManager/Checkpoint/14
NmstateError: VerificationError: Verification failure: ens1f0.interface.ethernet.sr-iov.vfs[1].min-tx-rate desire '100', current '0'

2) Vlan configuration is removed  -Failed
===
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 3c:fd:fe:bf:f2:bc brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 62:29:a2:8f:4c:f3 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether ba:f7:f3:04:69:ef brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    altname enp59s0f0
802: bond3: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
    link/ether ca:c1:75:cf:72:5b brd ff:ff:ff:ff:ff:ff

809: ens1f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 62:29:a2:8f:4c:f3 brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v0
810: ens1f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether ba:f7:f3:04:69:ef brd ff:ff:ff:ff:ff:ff
    altname enp59s0f0v1
===
sh-4.4# cat /sys/class/net/bond3/bonding/slaves 
sh-4.4# 
===
[core@worker-0 ~]$ nmcli con
NAME                          UUID                                  TYPE           DEVICE       
ovs-if-br-ex                  13953df0-3a2d-4742-b4f0-46c2afa2b933  ovs-interface  br-ex        
lo                            74ec3513-cc05-4cff-8202-d285285d7904  loopback       lo           
bond3                         68fa1e24-11cf-45cb-84db-68a6d4e65256  bond           bond3        
br-ex                         9350dc78-da4f-4663-9aee-3b0f49ebfe23  ovs-bridge     br-ex        
ens1f0                        93d576f4-20cb-4f83-9696-d61d116a4c9a  ethernet       ens1f0       
ens3f0np0                     ea1316d7-eab7-456c-b706-36ee8cd46f18  ethernet       ens3f0np0    
ens3f0v0                      fe1e62b3-7043-4502-90fb-9455f8aae8c2  ethernet       ens3f0v0     
ens3f0v0.481-slave-ovs-clone  49498d70-e77e-4f18-bcb9-6db2c61a4fa0  vlan           ens3f0v0.481 
ens3f1np1                     c3feed57-f0e9-43ed-a8d7-3d8585d6df51  ethernet       ens3f1np1    
ens3f1v0                      936f4ac1-d994-4592-a9a6-265ef5755883  ethernet       ens3f1v0     
ens3f1v0.481-slave-ovs-clone  492806be-b419-442d-a2cd-182e6d32d937  vlan           ens3f1v0.481 
ovs-if-phys0                  0024026e-e74c-470f-8cf2-240f005427e0  bond           bond0        
ovs-port-br-ex                31dc7054-f9a8-401e-b7f1-ade4225d684f  ovs-port       br-ex        
ovs-port-phys0                ec80211d-3e8e-4850-a046-43535fd5e6d3  ovs-port       bond0        
Wired connection 1            4394027f-0b8a-3d66-9e2c-f67da5937f0a  ethernet       --           
Wired connection 10           5936893f-2701-3a02-a2f4-cf597fac4d5b  ethernet       --           
Wired connection 11           4a6dc456-e01e-3667-beab-20d7daba0f51  ethernet       --           
Wired connection 12           1e514bf3-dc3a-3dc9-ae07-86077254bfd0  ethernet       --           
Wired connection 13           2fcc8186-6ed7-35a4-a831-d41d0a69f484  ethernet       --           
Wired connection 2            97d1bc50-4945-347a-9f73-feab16caa5a4  ethernet       --           
Wired connection 3            97e3255b-e92e-31cc-b935-22e159bbad5c  ethernet       --           
Wired connection 4            c50f87fd-72a4-3936-9d62-caea9991a81a  ethernet       --           
Wired connection 5            ed052a43-17df-38e9-82c1-2dda1189a16c  ethernet       --           
Wired connection 6            653ae237-87c5-3e6a-8a71-dd6a8716adb0  ethernet       --           
Wired connection 7            d9e79c4e-2cb0-3d6b-b1cb-12b47e34dcca  ethernet       --           
Wired connection 8            63aef5bb-4390-3435-b254-f85b83bff12f  ethernet       --           
Wired connection 9            8d5592a4-7d39-3955-84a6-07ab18230717  ethernet       --           
bond0                         fa3a2ece-ea18-4e0b-b504-2818b93bc977  bond           --           
ens1f0v0.481                  52aab5b1-6366-440a-bf9d-dedba7fafb41  vlan           --           
ens3f0v0.481                  b559ffc9-a185-49e7-84c7-259e16d45b19  vlan           --           
ens3f1v0.481                  0e06b134-9904-4b16-a5f7-cf73c87c6cdd  vlan           --

Journalctl:
http://pastebin.test.redhat.com/1104374

Comment 12 Carlos Goncalves 2023-07-10 11:52:13 UTC
Resetting needinfo. Info provided by Evgeny in comment #11.

Comment 13 Gris Ge 2023-07-11 07:47:01 UTC
Hi Evgeny Levin,


Thanks for the test feedback. It looks like bug https://bugzilla.redhat.com/show_bug.cgi?id=2217903 which is about to release on Aug 01.

I have uploaded new scratch build NetworkManager-1.42.2-6.sriov.el9 in https://people.redhat.com/fge/bz_2210164/

Can you try again?

Comment 14 Gris Ge 2023-07-13 08:37:53 UTC
When checking ice intel SRIOV NIC, we found kernel bug https://bugzilla.redhat.com/show_bug.cgi?id=2222597 for the support of max_tx_rate/min_tx_rate .


Note You need to log in before you can comment on or make changes to this bug.