Bug 1926279 - Pod ignores mtu setting from sriovNetworkNodePolicies in case of PF partitioning
Summary: Pod ignores mtu setting from sriovNetworkNodePolicies in case of PF partitioning
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-08 14:53 UTC by Nikita
Modified: 2021-07-27 22:42 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:42:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 484 0 None open Bug 1926279: Sync upstream 2021-3-15 2021-03-15 13:48:55 UTC
Github openshift sriov-network-operator pull 485 0 None open Bug 1926279: Sync OLM bundle manifests 2021-03-22 09:22:54 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:42:37 UTC

Description Nikita 2021-02-08 14:53:00 UTC
Description of problem:

If PF partitioning configured then VF in Pod inheritances MTU from PF instead of using MTU from sriovNetworkNodePolicies.

Current behavior:
According to current behavior sriov operator configures MTU on PF based on the biggest value across several policy however in this case operator configures biggest MTU on all VFs in pods and ignores corresponding values from relevant sriovNetworkNodePolicies.

Issue:
sriov operator ignores value configured by admin in SriovNetworkNodePolicy.spec.mtu except biggest one
We have configuration conflict between SriovNetworkNodePolicy. In current situation external SriovNetworkNodePolicy A (MTU 9000) configured for user A in namespace A changes MTU in environment of user B in namespace B despite SriovNetworkNodePolicy B (MTU 1500).

Possible solution:
This is correct to configure PF based on the biggest MTU value from several SriovNetworkNodePolicy however CNI plugin should apply configuration to pod from relevant SriovNetworkNodePolicy instead of using PF MTU. In this case my applications inside a pod will respect configured MTU . Of course it's possible to overwrite this value manually inside a pod (ip link set dev X mtu Y) but only in case if an user have enough permissions.


Version-Release number of selected component (if applicable):

4.7.0-fc.4

Sriov Operator: 
Image: registry.redhat.io/openshift4/ose-sriov-network-operator@sha256:569327e96d23fa53360ade21559797c9ce27cbb61b0228bc306cafbb7897b588
Image ID: registry.redhat.io/openshift4/ose-sriov-network-operator@sha256:383428eaee9e59e138925372a1e2029b2af4e7c1ff1841b983de804c6359e40e



How reproducible:
1) Create 2 SriovPolicy one with Jumbo mtu(9000) and one with standard MTU(1500)
2) Create relevant SriovNetworks for each policy
3) Create pod connected to Standard MTU policy
4) run >ip link show< in pod and check MTU. It's going to be 9000 instead of 1500

Steps to Reproduce:
1.
2.
3.

Actual results:
ip link show:
3585: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 20:04:0f:f1:88:03 brd ff:ff:ff:ff:ff:ff

Expected results:
ip link show:
3585: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 20:04:0f:f1:88:03 brd ff:ff:ff:ff:ff:ff

Additional info:
Below you can see attached configuration from my env:

The first one supports jumbo frame 9000:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
creationTimestamp: "2021-02-07T15:04:51Z"
generateName: test-policy-jumbo
generation: 1
managedFields:

apiVersion: sriovnetwork.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:spec:
.: {}
f:deviceType: {}
f:mtu: {}
f:nicSelector:
.: {}
f:pfNames: {}
f:nodeSelector:
.: {}
f:node-role.kubernetes.io/worker-cnf: {}
f:numVfs: {}
f:priority: {}
f:resourceName: {}
f:status: {}
manager: sriov.test
operation: Update
time: "2021-02-07T15:04:51Z"
name: test-policy-jumbohq9jq
namespace: openshift-sriov-network-operator
resourceVersion: "9425117"
selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodepolicies/test-policy-jumbohq9jq
uid: eaa3b055-11fc-4ea5-9f7f-4a43428b820a
spec:
deviceType: netdevice
isRdma: false
linkType: eth
mtu: 9000
nicSelector:
pfNames:
ens3f0#4-5
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 6
priority: 99
resourceName: testresourcejumbo
The Second one supports custom MTU 1450

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
creationTimestamp: "2021-02-07T15:04:51Z"
generateName: test-policy-custom
generation: 1
managedFields:

apiVersion: sriovnetwork.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:generateName: {}
f:spec:
.: {}
f:deviceType: {}
f:mtu: {}
f:nicSelector:
.: {}
f:pfNames: {}
f:nodeSelector:
.: {}
f:node-role.kubernetes.io/worker-cnf: {}
f:numVfs: {}
f:priority: {}
f:resourceName: {}
f:status: {}
manager: sriov.test
operation: Update
time: "2021-02-07T15:04:51Z"
name: test-policy-customn87pg
namespace: openshift-sriov-network-operator
resourceVersion: "9425101"
selfLink: /apis/sriovnetwork.openshift.io/v1/namespaces/openshift-sriov-network-operator/sriovnetworknodepolicies/test-policy-customn87pg
uid: 1d671d04-c454-402c-926c-d48316cdaf7c
spec:
deviceType: netdevice
isRdma: false
linkType: eth
mtu: 1450
nicSelector:
pfNames:
ens3f0#2-3
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 6
priority: 99
resourceName: testresourcecustom
Pod connected to policy with 1450 MTU:
oc get pod testpod-vkgff -o yaml -n sriov-operator-tests
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.135.1.84/23"],"mac_address":"0a:58:0a:87:01:54","gateway_ips":["10.135.0.1"],"ip_address":"10.135.1.84/23","gateway_ip":"10.135.0.1"}}'
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "",
"interface": "eth0",
"ips": [
"10.135.1.84"
],
"mac": "0a:58:0a:87:01:54",
"default": true,
"dns": {}
},{
"name": "sriov-operator-tests/test-sriov-static-custom",
"interface": "net1",
"ips": [
"192.168.100.2"
],
"mac": "4a:c7:eb:89:7c:b6",
"dns": {},
"device-info": {
"type": "pci",
"version": "1.0.0",
"pci": {
"pci-address": "0000:d8:02.2"
}
}
}]
k8s.v1.cni.cncf.io/networks: "[\n\t\t{\n\t\t\t"name": "test-sriov-static-custom", \n\t\t\t"mac": "20:04:0f:f1:88:03",\n\t\t\t"ips": ["192.168.100.2/24"]\n\t\t}\n\t]"
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "",
"interface": "eth0",
"ips": [
"10.135.1.84"
],
"mac": "0a:58:0a:87:01:54",
"default": true,
"dns": {}
},{
"name": "sriov-operator-tests/test-sriov-static-custom",
"interface": "net1",
"ips": [
"192.168.100.2"
],
"mac": "4a:c7:eb:89:7c:b6",
"dns": {},
"device-info": {
"type": "pci",
"version": "1.0.0",
"pci": {
"pci-address": "0000:d8:02.2"
}
}
}]
openshift.io/scc: privileged
creationTimestamp: "2021-02-07T15:17:18Z"
generateName: testpod-
managedFields:

apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:k8s.ovn.org/pod-networks: {}
manager: ovnkube
operation: Update
time: "2021-02-07T15:17:18Z"
apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:k8s.v1.cni.cncf.io/networks: {}
f:generateName: {}
f:spec:
f:containers:
k:{"name":"test"}:
.: {}
f:command: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources:
.: {}
f:limits:
f:openshift.io/testresourcecustom: {}
f:requests:
f:openshift.io/testresourcecustom: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:nodeSelector:
.: {}
f:kubernetes.io/hostname: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext:
.: {}
f:seLinuxOptions:
f:level: {}
f:terminationGracePeriodSeconds: {}
manager: sriov.test
operation: Update
time: "2021-02-07T15:17:18Z"
apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
f:k8s.v1.cni.cncf.io/network-status: {}
f:k8s.v1.cni.cncf.io/networks-status: {}
manager: multus
operation: Update
time: "2021-02-07T15:17:20Z"
apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:phase: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"10.135.1.84"}:
.: {}
f:ip: {}
f:startTime: {}
manager: kubelet
operation: Update
time: "2021-02-07T15:17:23Z"
name: testpod-vkgff
namespace: sriov-operator-tests
resourceVersion: "9431978"
selfLink: /api/v1/namespaces/sriov-operator-tests/pods/testpod-vkgff
uid: be22980a-768f-4e37-8de7-fe4401d86e6c
spec:
containers:
command:
sleep
INF
image: docker-registry.upshift.redhat.com/cnf-gotests/cnf-gotests-client:v4.7
imagePullPolicy: IfNotPresent
name: test
resources:
limits:
openshift.io/testresourcecustom: "1"
requests:
openshift.io/testresourcecustom: "1"
securityContext:
capabilities:
drop:
MKNOD
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-d2f8r
readOnly: true
mountPath: /etc/podnetinfo
name: podnetinfo
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
name: default-dockercfg-cxrgv
nodeName: cnfdt13.lab.eng.tlv2.redhat.com
nodeSelector:
kubernetes.io/hostname: cnfdt13.lab.eng.tlv2.redhat.com
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
seLinuxOptions:
level: s0:c26,c15
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 0
tolerations:
effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
name: default-token-d2f8r
secret:
defaultMode: 420
secretName: default-token-d2f8r
downwardAPI:
defaultMode: 420
items:
fieldRef:
apiVersion: v1
fieldPath: metadata.labels
path: labels
fieldRef:
apiVersion: v1
fieldPath: metadata.annotations
path: annotations
name: podnetinfo
status:
conditions:
lastProbeTime: null
lastTransitionTime: "2021-02-07T15:17:18Z"
status: "True"
type: Initialized
lastProbeTime: null
lastTransitionTime: "2021-02-07T15:17:23Z"
status: "True"
type: Ready
lastProbeTime: null
lastTransitionTime: "2021-02-07T15:17:23Z"
status: "True"
type: ContainersReady
lastProbeTime: null
lastTransitionTime: "2021-02-07T15:17:18Z"
status: "True"
type: PodScheduled
containerStatuses:
containerID: cri-o://5ea5884825a5eb8bce9b2d6568564be6ca93a03ed43098f9a630704bf683b9ef
image: docker-registry.upshift.redhat.com/cnf-gotests/cnf-gotests-client:v4.7
imageID: docker-registry.upshift.redhat.com/cnf-gotests/cnf-gotests-client@sha256:ec0d2e9591e0f124be4e60a80b24ce17b552b69a5c43b638ca959c6e89eb0029
lastState: {}
name: test
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2021-02-07T15:17:22Z"
hostIP: 10.46.55.27
phase: Running
podIP: 10.135.1.84
podIPs:
ip: 10.135.1.84
qosClass: BestEffort
startTime: "2021-02-07T15:17:18Z"
Output from pod
ip link show:
3585: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 20:04:0f:f1:88:03 brd ff:ff:ff:ff:ff:ff
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if3614: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP mode DEFAULT group default
link/ether 0a:58:0a:87:01:54 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Comment 2 zhaozhanqi 2021-03-29 10:07:10 UTC
Verified this bug on 


 cat *
apiVersion: v1
kind: Pod
metadata:
  generateName: testpod1
  labels:
    env: test
  annotations:
    k8s.v1.cni.cncf.io/networks: intel-netdevice-1400
spec:
  containers:
  - name: test-pod
    image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
apiVersion: v1
kind: Pod
metadata:
  generateName: testpod1
  namespace: z1
  labels:
    env: test
  annotations:
    k8s.v1.cni.cncf.io/networks: intel-netdevice-9000
spec:
  containers:
  - name: test-pod
    image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice-mtu1400
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens1f0#3-4
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  priority: 99
  numVfs: 5
  mtu: 1400
  resourceName: intelmtu1400
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice-mtu900
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    pfNames:
      - ens1f0#1-2
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  priority: 99
  numVfs: 5
  mtu: 9000
  resourceName: intelmtu9000
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-netdevice-1400
  namespace: openshift-sriov-network-operator
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.56.215.0/24",
      "rangeStart": "10.56.215.171",
      "rangeEnd": "10.56.215.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "10.56.215.1"
    }
  vlan: 0
  spoofChk: "on"
  trust: "off"
  resourceName: intelmtu1400
  networkNamespace: z2
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-netdevice-9000
  namespace: openshift-sriov-network-operator
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "10.56.217.1"
    }
  vlan: 0
  spoofChk: "on"
  trust: "off"
  resourceName: intelmtu9000
  networkNamespace: z1



# oc exec -n z1 testpod1zzm5r -- ip a show net1
2487: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 9a:1c:8c:34:e3:ed brd ff:ff:ff:ff:ff:ff
    inet 10.56.217.171/24 brd 10.56.217.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::981c:8cff:fe34:e3ed/64 scope link 
       valid_lft forever preferred_lft forever



oc exec -n z2 testpod1pjqrr -- ip a show net1
2490: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UP group default qlen 1000
    link/ether ee:cb:17:de:50:3b brd ff:ff:ff:ff:ff:ff
    inet 10.56.215.171/24 brd 10.56.215.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::eccb:17ff:fede:503b/64 scope link 
       valid_lft forever preferred_lft forever

Comment 3 zhaozhanqi 2021-03-29 10:07:55 UTC
verified this bug on 4.8.0-202103270026.p0

Comment 6 errata-xmlrpc 2021-07-27 22:42:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.