Description of problem: The operator always reset the PF MTU to 1500 when the policy is deleted Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Set the PF mtu to 5000. e.g. $ sudo ip link set dev ens803f0 mtu 5000 2. Deploy the sriov network operator 3. Apply following policy apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-net-2 spec: resourceName: nic2 nodeSelector: kubernetes.io/hostname: worker-0 feature.node.kubernetes.io/network-sriov.capable: "true" priority: 99 mtu: 9000 numVfs: 4 nicSelector: pfNames: ['ens803f0#0-0'] isRdma: false 4. remove the policy Actual results: The MTU of ens803f0 was reset to 1500 Expected results: The MTU of ens803f0 was reset to 5000 Additional info:
Seems the MTU of PF did not be reset the original value when the policy is deleted. eg. 1. set the MTU to 8800 via `ip link set dev ens1f0 mtu 8800` 2. Create one network with MTU is 9200 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-netdevice namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: pfNames: - ens1f0 rootDevices: - '0000:3b:00.0' vendor: '8086' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 5 mtu: 9200 priority: 99 resourceName: intelnetdevice 3. Check the PF and VF mtu are 9200. 4. Delete the above policy 5. Check the MTU of PF, the value still 9200. expected value is 8800.
(In reply to zhaozhanqi from comment #3) > Seems the MTU of PF did not be reset the original value when the policy is > deleted. Current fix records the MTU value when config daemon starts, and reset the value when policy(applied on that device) is deleted. > > eg. > 1. set the MTU to 8800 via `ip link set dev ens1f0 mtu 8800` Was the SR-IOV Operator installed before setting mtu to 8800? if yes, what was the mtu value of ens1f0 when SR-IOV Operator gets installed? Is it 9200? > 2. Create one network with MTU is 9200 > apiVersion: sriovnetwork.openshift.io/v1 > kind: SriovNetworkNodePolicy > metadata: > name: intel-netdevice > namespace: openshift-sriov-network-operator > spec: > deviceType: netdevice > nicSelector: > pfNames: > - ens1f0 > rootDevices: > - '0000:3b:00.0' > vendor: '8086' > nodeSelector: > feature.node.kubernetes.io/sriov-capable: 'true' > numVfs: 5 > mtu: 9200 > priority: 99 > resourceName: intelnetdevice > > 3. Check the PF and VF mtu are 9200. > > 4. Delete the above policy > > 5. Check the MTU of PF, the value still 9200. expected value is 8800. It maybe due to that the recorded PF mtu is 9200, so it was reset to 9200.
let's take one new PF which is 1500 by default as an example: 1. see the default mtu is 1500: #ip a show ens3f0 8: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 98:03:9b:97:21:be brd ff:ff:ff:ff:ff:ff 2. create the policy with mtu is 1900 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx278-netdevice namespace: openshift-sriov-network-operator spec: mtu: 1900 nicSelector: pfNames: - ens3f0 rootDevices: - '0000:5e:00.0' vendor: '15b3' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 1 resourceName: mlx278netdevice 3. Check the mtu PF and Vf are 1900 # ip a show ens3f0 8: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1900 qdisc mq state UP group default qlen 1000 link/ether 98:03:9b:97:21:be brd ff:ff:ff:ff:ff:ff - Vfs: - deviceID: "1018" driver: mlx5_core mac: 1e:fa:89:df:f4:dc mtu: 1900 name: ens3f0v0 pciAddress: 0000:5e:00.2 vendor: 15b3 vfID: 0 deviceID: "1017" driver: mlx5_core linkSpeed: 40000 Mb/s mac: 98:03:9b:97:21:be mtu: 1900 name: ens3f0 numVfs: 1 pciAddress: 0000:5e:00.0 totalvfs: 1 vendor: 15b3 4. Delete the policy 5. Check the MTU of PF still 1900, the expected value should be 1500 # ip link show ens3f0 8: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1900 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 98:03:9b:97:21:be brd ff:ff:ff:ff:ff:ff
re-test the issue by updating the crd this works well. Move this bug to 'verified'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196