--------------------------------------------------------------- # Installing OpenShift Operators directly from github Here's how I installed the operator from upstream (**not supported**): Let's install an Operator directly form github. As an example, let's use the SRIOV Operator: [https://github.com/openshift/sriov-network-operator/blob/master/doc/quickstart.md](https://github.com/openshift/sriov-network-operator/blob/master/doc/quickstart.md) I installed OpenShift 4.6.1 and my jump server uses RHEL 8.2 ~~~ [root@openshift-jumpserver-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.1 True False 17h Cluster version is 4.6.1 ~~~ ## Prerequisites ### Setting up go ~~~ yum install go -y ~~~ Updated your PATH and GOPATH: ~~~ [root@openshift-jumpserver-0 ~]# cat ~/.bash_profile # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export GOPATH=$HOME/golang export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin [root@openshift-jumpserver-0 ~]# mkdir /root/golang ~~~ Log back into the system. ## Building the operator from source Pull the operator: ~~~ go get github.com/openshift/sriov-network-operator ~~~ Once that returns: ~~~ cd $GOPATH/src/github.com/openshift/sriov-network-operator ~~~ Fix [https://github.com/openshift/sriov-network-operator/issues/383](https://github.com/openshift/sriov-network-operator/issues/383) Then, run: ~~~ make deploy-setup ~~~ To uninstall again: ~~~ make uninstall ~~~ ---------------------------------------------------- After I installed the upstream SRIOV operator, here's what my nodes look like: ~~~ [root@openshift-jumpserver-0 ~]# oc get nodes NAME STATUS ROLES AGE VERSION openshift-master-0 Ready master 18h v1.19.0+d59ce34 openshift-master-1 Ready master 18h v1.19.0+d59ce34 openshift-master-2 Ready master 18h v1.19.0+d59ce34 openshift-worker-0 Ready worker 17h v1.19.0+d59ce34 openshift-worker-1 Ready worker 17h v1.19.0+d59ce34 [root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-0 -o yaml | grep openshift.io machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} [root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-1 -o yaml | grep openshift.io machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} ~~~ --------------------------------------------------------------- # Configuring `SriovNetworkNodePolicy` with `nicSelector.pfNames` on worker-0 Now, I'm installing my `SriovNetworkNodePolicy`. Here's my definition: ~~~ [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f0-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f0Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-0 priority: 10 mtu: 1500 numVfs: 5 nicSelector: pfNames: ["enp5s0f0"] deviceType: "netdevice" isRdma: false [root@openshift-jumpserver-0 ~]# oc apply -f networkpolicy-netdevice.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-enp5s0f0-netdevice created ~~~ After 5 minutes: ~~~ [root@openshift-jumpserver-0 ~]# sleep 300 ; echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f0Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f0Netdev: "5" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} ~~~ --------------------------------------------------------------- # Configuring `SriovNetworkNodePolicy` with `nicSelector.pfNames` on worker-1 ~~~ [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice2.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f1-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f1Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-1 priority: 10 mtu: 1500 numVfs: 6 nicSelector: pfNames: ["enp5s0f1"] deviceType: "netdevice" isRdma: false [root@openshift-jumpserver-0 ~]# oc apply -f networkpolicy-netdevice2.yaml ~~~ After 5 minutes: ~~~ [root@openshift-jumpserver-0 ~]# sleep 300 ; echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" ~~~ You are right, there is indeed a bug here upstream when it comes to the reporting. But on the worker nodes, I do see correct configuration: ~~~ [core@openshift-worker-0 ~]$ ip link ls dev enp5s0f0 8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:e2:c0 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 96:83:66:9b:44:50 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 1 link/ether ee:44:7b:d2:0b:52 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 2 link/ether 72:9e:3b:3e:e9:21 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 3 link/ether 5a:8c:a2:e5:e8:5b brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 4 link/ether ba:95:be:d2:fb:b5 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off [core@openshift-worker-0 ~]$ ip link ls dev enp5s0f1 9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:e2:c2 brd ff:ff:ff:ff:ff:ff ~~~ ~~~ [root@openshift-worker-1 ~]# ip link ls dev enp5s0f0 8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:df:c0 brd ff:ff:ff:ff:ff:ff [root@openshift-worker-1 ~]# ip link ls dev enp5s0f1 9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:df:c2 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 5a:1a:fa:8b:27:d9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 1 link/ether 6e:0d:71:97:61:5e brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 2 link/ether 82:ec:7e:ef:df:91 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 3 link/ether 36:45:80:61:65:17 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 4 link/ether 32:af:f6:ef:cf:7c brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 5 link/ether 1e:ad:fa:ae:fc:b6 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off ~~~ --------------------------------------------------------------- # Deleting `sriov-enp5s0f1-netdevice` SriovNetworkNodePolicy ~~~ [root@openshift-jumpserver-0 ~]# oc delete -f networkpolicy-netdevice2.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io "sriov-enp5s0f1-netdevice" deleted ~~~ A couple of minutes in: ~~~ [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:sriovnetwork.openshift.io/state: {} openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" ~~~ I tried to "kick" this and delete the device-plugin pod - however, this makes no difference, only waiting eventually leads to the correct count ... ~~~ [root@openshift-jumpserver-0 ~]# oc delete pod sriov-device-plugin-tnw62 pod "sriov-device-plugin-tnw62" deleted ~~~ Just waiting: ~~~ [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" ~~~ --------------------------------------------------------------- # Configuring `SriovNetworkNodePolicy` with `nicSelector.pfNames` on worker-1 - take 2 I just tried this again, to be sure: ~~~ [root@openshift-jumpserver-0 ~]# oc apply -f networkpolicy-netdevice2.yaml ~~~ ~~~ [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc get pods -o wide | grep device-plugin sriov-device-plugin-t8x7n 1/1 Running 0 99s 192.168.123.221 openshift-worker-1 <none> <none> sriov-device-plugin-w8jn5 1/1 Running 0 2m50s 192.168.123.220 openshift-worker-0 <none> <none> ~~~ But no change deleting the device-plugin: ~~~ [root@openshift-jumpserver-0 ~]# oc delete pod sriov-device-plugin-t8x7n pod "sriov-device-plugin-t8x7n" deleted oc delete pod sriov-device-plugin-w8jn5 [root@openshift-jumpserver-0 ~]# oc delete pod sriov-device-plugin-w8jn5 pod "sriov-device-plugin-w8jn5" deleted [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" ~~~ --------------------------------------------------------------- # Deleting `sriov-enp5s0f1-netdevice` SriovNetworkNodePolicy again ~~~ [root@openshift-jumpserver-0 ~]# oc delete -f networkpolicy-netdevice2.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io "sriov-enp5s0f1-netdevice" deleted ~~~ After half a minute, deleting the pod again: ~~~ [root@openshift-jumpserver-0 ~]# oc get pods -o wide | grep -i device sriov-device-plugin-7ss5f 1/1 Running 0 22s 192.168.123.220 openshift-worker-0 <none> <none> [root@openshift-jumpserver-0 ~]# oc delete pod sriov-device-plugin-7ss5f pod "sriov-device-plugin-7ss5f" deleted ~~~ That doesn't help though. So just waiting ... after a few minutes: ~~~ [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" ~~~ --------------------------------------------------------------- # using rootDevice for `sriov-enp5s0f1-netdevice` SriovNetworkNodePolicy ~~~ [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice2.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f1-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f1Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-1 priority: 10 mtu: 1500 numVfs: 6 nicSelector: rootDevices: ["0000:05:00.1"] deviceType: "netdevice" isRdma: false [root@openshift-jumpserver-0 ~]# oc apply -f networkpolicy-netdevice2.yaml sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-enp5s0f1-netdevice created ~~~ This is also broken (even more so): ~~~ [root@openshift-jumpserver-0 ~]# sleep 300 ; echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === (...) openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "12" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "12" === worker1 === (...) openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "13" openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "13" ~~~ But the actual assignments are good on both nodes: ~~~ [root@openshift-worker-0 ~]# ip link ls dev enp5s0f0 8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:e2:c0 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 96:83:66:9b:44:50 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 1 link/ether ee:44:7b:d2:0b:52 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 2 link/ether 72:9e:3b:3e:e9:21 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 3 link/ether 5a:8c:a2:e5:e8:5b brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 4 link/ether ba:95:be:d2:fb:b5 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off [root@openshift-worker-0 ~]# ip link ls dev enp5s0f1 9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:e2:c2 brd ff:ff:ff:ff:ff:ff ~~~ ~~~ [root@openshift-worker-1 ~]# ip link ls dev enp5s0f0 8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:df:c0 brd ff:ff:ff:ff:ff:ff [root@openshift-worker-1 ~]# ip link ls dev enp5s0f1 9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether a0:36:9f:e5:df:c2 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 5a:1a:fa:8b:27:d9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 1 link/ether 6e:0d:71:97:61:5e brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 2 link/ether 82:ec:7e:ef:df:91 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 3 link/ether 36:45:80:61:65:17 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 4 link/ether 32:af:f6:ef:cf:7c brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off vf 5 link/ether 1e:ad:fa:ae:fc:b6 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off ~~~ --------------------------------------------------------------- # Deleting both SriovNetworkNodePolicy Deleting both SriovNetworkNodePolicy resources will eventually set the count to 0, but will not delete the annotations. If lots of changes are done, then a node will end up with a lot of useless annotations: ~~~ openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" ~~~ --------------------------------------------------------------- # Testing the `nodeSelector` You can test your node selector ; simply run the following (adjust according to your hostname): ~~~ [root@openshift-jumpserver-0 ~]# oc get nodes -l kubernetes.io/hostname=openshift-worker-0 NAME STATUS ROLES AGE VERSION openshift-worker-0 Ready worker 17h v1.19.0+d59ce34 [root@openshift-jumpserver-0 ~]# oc get nodes -l kubernetes.io/hostname=openshift-worker-1 NAME STATUS ROLES AGE VERSION openshift-worker-1 Ready worker 17h v1.19.0+d59ce34 ~~~ ---------------------------------------------------------------- Kind regards, Andreas
I reproduced this with the 4.6 downstream operator, as well: ~~~ [root@openshift-jumpserver-0 ~]# oc get csv NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.6.0-202010200139.p0 SR-IOV Network Operator 4.6.0-202010200139.p0 Succeeded [root@openshift-jumpserver-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.1 True False 20h Cluster version is 4.6.1 [root@openshift-jumpserver-0 ~]# ~~~ ~~~ [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f0-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f0Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-0 priority: 10 mtu: 1500 numVfs: 5 nicSelector: pfNames: ["enp5s0f0"] deviceType: "netdevice" isRdma: false [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice2.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f1-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f1Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-1 priority: 10 mtu: 1500 numVfs: 6 nicSelector: pfNames: ["enp5s0f1"] deviceType: "netdevice" isRdma: false ~~~ ~~~ oc apply -f networkpolicy-netdevice.yaml ~~~ Wait ... ~~~ [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "0" ~~~ ~~~ oc apply -f networkpolicy-netdevice2.yaml ~~~ Wait ... ~~~ [root@openshift-jumpserver-0 ~]# echo === worker0 ===; oc get nodes openshift-worker-0 -o yaml | grep -i openshift.io ; echo === worker1 === ; oc get nodes openshift-worker-1 -o yaml | grep -i openshift.io === worker0 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" === worker1 === machineconfiguration.openshift.io/currentConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/desiredConfig: rendered-worker-2f51e2b12f6daef29d5255f032ff70d0 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle node.openshift.io/os_id: rhcos f:machineconfiguration.openshift.io/currentConfig: {} f:machineconfiguration.openshift.io/desiredConfig: {} f:machineconfiguration.openshift.io/reason: {} f:machineconfiguration.openshift.io/state: {} f:sriovnetwork.openshift.io/state: {} f:node.openshift.io/os_id: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} f:openshift.io/enp5s0f0Netdev: {} f:openshift.io/enp5s0f1Netdev: {} openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" [root@openshift-jumpserver-0 ~]# ~~~
Two different policies ^^ will update Daemon Set selector: oc get ds sriov-device-plugin -o yaml spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - openshift-worker-0 - matchExpressions: - key: kubernetes.io/hostname operator: In values: - openshift-worker-1 They use same configMap - configMap: defaultMode: 420 name: device-plugin-config oc get cm device-plugin-config -o yaml config.json: '{"resourceList":[{"resourceName":"enp5s0f0Netdev","selectors":{"pfNames":["enp5s0f0"],"IsRdma":false},"SelectorObj":null},{"resourceName":"enp5s0f1Netdev","selectors":{"pfNames":["enp5s0f1"],"IsRdma":false},"SelectorObj":null}]}' Then the device-plugin iterates thru it on each of the node: https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/blob/047fb351807c278d4c7dc3f5d25c876260b0560e/cmd/sriovdp/manager.go#L87 So yes nodeSelector in terms of running daemon set on certain nodes works well, but then shouldn't there be a unique config maps per each policy created?
Hi, I talked to Robin, and he said that: "Got response from dev, it is correct they use centralized configMap for all device plugin pods. Meaning that one node would have all the resources advertised if it contains that interface that matches with any other node policy. 3:17 So we have confirmation, but he is asking whether it is a problem. (edited) 3:18 So we can probably inform the customer this is expected and close the case." But that honestly makes no sense. We are annotating per node the number of units of a given resource that the node has. As you can see above, this overlap causes invalid accounting of resources: worker-0: openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "1" worker-1: openshift.io/enp5s0f0Netdev: "1" openshift.io/enp5s0f1Netdev: "6" Let's remember that this is the consequence of applying: ~~~ [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f0-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f0Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-0 priority: 10 mtu: 1500 numVfs: 5 nicSelector: pfNames: ["enp5s0f0"] deviceType: "netdevice" isRdma: false [root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice2.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-enp5s0f1-netdevice namespace: openshift-sriov-network-operator spec: resourceName: enp5s0f1Netdev nodeSelector: kubernetes.io/hostname: openshift-worker-1 priority: 10 mtu: 1500 numVfs: 6 nicSelector: pfNames: ["enp5s0f1"] deviceType: "netdevice" isRdma: false ~~~ So something funky is going on here and needs to be fixed. If it's not on a functional level, it does indeed affect what's reported to the operator. The result should be: worker-0: openshift.io/enp5s0f0Netdev: "5" openshift.io/enp5s0f1Netdev: "0" worker-1: openshift.io/enp5s0f0Netdev: "0" openshift.io/enp5s0f1Netdev: "6" Or even better just plain: worker-0: openshift.io/enp5s0f0Netdev: "5" worker-1: openshift.io/enp5s0f0Netdev: "1"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633