1854775 – [sriov] The operator reset the PF MTU to 1500 when policy is deleted

Bug 1854775 - [sriov] The operator reset the PF MTU to 1500 when policy is deleted

Summary: [sriov] The operator reset the PF MTU to 1500 when policy is deleted

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Peng Liu
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1854778
TreeView+	depends on / blocked

Reported:	2020-07-08 08:41 UTC by Peng Liu
Modified:	2020-10-27 16:13 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1854778 1854779 (view as bug list)
Environment:
Last Closed:	2020-10-27 16:12:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift sriov-network-operator pull 286	0	None	closed	BUG 1854775: Improve the MTU set/reset logic	2020-08-13 06:55:18 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:13:15 UTC

Description Peng Liu 2020-07-08 08:41:38 UTC

Description of problem:
The operator always reset the PF MTU to 1500 when the policy is deleted

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Set the PF mtu to 5000. e.g. $ sudo ip link set dev ens803f0 mtu 5000

2. Deploy the sriov network operator 

3. Apply following policy
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-2
spec:
  resourceName: nic2
  nodeSelector:
    kubernetes.io/hostname: worker-0
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: 99
  mtu: 9000
  numVfs: 4
  nicSelector:
    pfNames: ['ens803f0#0-0']
  isRdma: false

4. remove the policy

Actual results:
The MTU of ens803f0 was reset to 1500

Expected results:
The MTU of ens803f0 was reset to 5000

Additional info:

Comment 3 zhaozhanqi 2020-07-14 07:41:23 UTC

Seems the MTU of PF did not be reset the original value when the policy is deleted. 

eg.  
1. set the MTU to 8800 via `ip link set dev ens1f0 mtu 8800`
2. Create one network with MTU is 9200
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens1f0
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 5
  mtu: 9200
  priority: 99
  resourceName: intelnetdevice

3. Check the PF and VF mtu are 9200. 

4. Delete the above policy 

5. Check the MTU of PF, the value still 9200.  expected value is 8800.

Comment 4 zenghui.shi 2020-07-16 02:01:03 UTC

(In reply to zhaozhanqi from comment #3)
> Seems the MTU of PF did not be reset the original value when the policy is
> deleted. 

Current fix records the MTU value when config daemon starts, and reset the value when policy(applied on that device) is deleted.

> 
> eg.  
> 1. set the MTU to 8800 via `ip link set dev ens1f0 mtu 8800`

Was the SR-IOV Operator installed before setting mtu to 8800? if yes, what was the mtu value of ens1f0 when SR-IOV Operator gets installed? Is it 9200?

> 2. Create one network with MTU is 9200
> apiVersion: sriovnetwork.openshift.io/v1
> kind: SriovNetworkNodePolicy
> metadata:
>   name: intel-netdevice
>   namespace: openshift-sriov-network-operator
> spec:
>   deviceType: netdevice
>   nicSelector:
>     pfNames:
>       - ens1f0
>     rootDevices:
>       - '0000:3b:00.0'
>     vendor: '8086'
>   nodeSelector:
>     feature.node.kubernetes.io/sriov-capable: 'true'
>   numVfs: 5
>   mtu: 9200
>   priority: 99
>   resourceName: intelnetdevice
> 
> 3. Check the PF and VF mtu are 9200. 
> 
> 4. Delete the above policy 
> 
> 5. Check the MTU of PF, the value still 9200.  expected value is 8800.

It maybe due to that the recorded PF mtu is 9200, so it was reset to 9200.

Comment 5 zhaozhanqi 2020-07-16 06:58:09 UTC

let's take one new PF which is 1500 by default as an example:

1. see the default mtu is 1500:
 #ip a show ens3f0
8: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 98:03:9b:97:21:be brd ff:ff:ff:ff:ff:ff

2. create the policy with mtu is 1900

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlx278-netdevice
  namespace: openshift-sriov-network-operator
spec:
  mtu: 1900
  nicSelector:
    pfNames:
      - ens3f0
    rootDevices:
      - '0000:5e:00.0'
    vendor: '15b3'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 1
  resourceName: mlx278netdevice

3. Check the mtu PF and Vf are 1900
# ip a show ens3f0
8: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1900 qdisc mq state UP group default qlen 1000
    link/ether 98:03:9b:97:21:be brd ff:ff:ff:ff:ff:ff

  - Vfs:
    - deviceID: "1018"
      driver: mlx5_core
      mac: 1e:fa:89:df:f4:dc
      mtu: 1900
      name: ens3f0v0
      pciAddress: 0000:5e:00.2
      vendor: 15b3
      vfID: 0
    deviceID: "1017"
    driver: mlx5_core
    linkSpeed: 40000 Mb/s
    mac: 98:03:9b:97:21:be
    mtu: 1900
    name: ens3f0
    numVfs: 1
    pciAddress: 0000:5e:00.0
    totalvfs: 1
    vendor: 15b3

4. Delete the policy

5. Check the MTU of PF still 1900,  the expected value should be 1500

# ip link show ens3f0
8: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1900 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 98:03:9b:97:21:be brd ff:ff:ff:ff:ff:ff

Comment 6 zhaozhanqi 2020-07-17 04:32:48 UTC

re-test the issue by updating the crd

this works well.  Move this bug to 'verified'

Comment 8 errata-xmlrpc 2020-10-27 16:12:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.