Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1888828

Summary: rootDevices filter not working for sriov-network-device-plugin - ListAndWatch assigns too many devices to specific resource
Product: OpenShift Container Platform Reporter: Andreas Karis <akaris>
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: mmethot, zshi
Version: 4.4   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-20 13:26:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2020-10-15 20:35:27 UTC
Description of problem:

Whereas the OpenShift SR-IOV operator knows how to deal with rootDevices (https://github.com/openshift/sriov-network-operator), the underlying  k8snetworkplumbingwg/sriov-network-device-plugin  (https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/) does not filter on rootDevices. 

This also explains why I ran into https://bugzilla.redhat.com/show_bug.cgi?id=1888763 ; the rootDevices nicSelector actually is *only* a selector for the VFs to be created which is part of what the SR-IOV operator does, as far as I understand. So that aforementioned "bug" might be a security mechanism due to a yet missing feature.
~~~
[root@openshift-jumpserver-0 ~]# oc explain SriovNetworkNodePolicy.spec.nicSelector
KIND:     SriovNetworkNodePolicy
VERSION:  sriovnetwork.openshift.io/v1

RESOURCE: nicSelector <Object>

DESCRIPTION:
     NicSelector selects the NICs to be configured

FIELDS:
   deviceID	<string>
     The device hex code of SR-IoV device. Allowed value "1583", "158b", "10fb",
     "1015", "1017".

   pfNames	<[]string>
     Name of SR-IoV PF.

   rootDevices	<[]string>
     PCI address of SR-IoV PF.

   vendor	<string>
     The vendor hex code of SR-IoV device. Allowed value "8086", "15b3".
~~~

However, the sriov-network-device-plugin is fed with the same configuration. That makes sense, the operator is responsible for VF setup. And the plugin that then selects the actual PCI device should receive and work with the same configuration. Unfortunately, the sriov-network-device-plugin does not understand the rootDevices selector, at least not as a filter. The consequence is that the following configuration:
~~~
[root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-enp5s0f0-akaris
  namespace: openshift-sriov-network-operator
spec:
  resourceName: enp5s0f0NetdevAkaris
  priority: 99
  numVfs: 2
  nicSelector:
    rootDevices:
      - '0000:05:00.0'
    vendor: "8086"
  deviceType: "netdevice"
  isRdma: false
  nodeSelector:
    kubernetes.io/hostname: openshift-worker-0.example.com
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-enp5s0f1-akaris2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: enp5s0f1NetdevAkaris2
  priority: 99
  numVfs: 1
  nicSelector:
    rootDevices:
      - '0000:05:00.1'
    vendor: '8086'
  deviceType: "netdevice"
  isRdma: false
  nodeSelector:
    kubernetes.io/hostname: openshift-worker-0.example.com
~~~

... will create the correct VFs. But the plugin will assign all 3 VFs to each of the 2 resources. While creating the SriovNetworkNodePolicy and resources enp5s0f1NetdevAkaris2 and enp5s0f0NetdevAkaris, I'm monitoring the journal on worker-0:
~~~
journalctl -f | grep Akaris  # on the worker node
(...)
Oct 15 19:45:06 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:06.073473    2159 manager.go:447] Registered endpoint &{0xc001a1e6b8 0xc002446700 /var/lib/kubelet/plugins_registry/enp5s0f0NetdevAkaris.sock openshift.io/enp5s0f0NetdevAkaris {0 0 <nil>} {0 0} 0x1b65510}
Oct 15 19:45:06 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:06.073920    2159 endpoint.go:111] State pushed for device plugin openshift.io/enp5s0f0NetdevAkaris
Oct 15 19:45:06 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:06.074556    2159 endpoint.go:111] State pushed for device plugin openshift.io/enp5s0f1NetdevAkaris2
Oct 15 19:45:12 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:12.122736    2159 setters.go:323] Update capacity for openshift.io/enp5s0f0NetdevAkaris to 3
Oct 15 19:45:12 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:12.122736    2159 setters.go:323] Update capacity for openshift.io/enp5s0f0NetdevAkaris to 3
Oct 15 19:45:12 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:12.122758    2159 setters.go:323] Update capacity for openshift.io/enp5s0f1NetdevAkaris2 to 3
Oct 15 19:45:12 openshift-worker-0.example.com hyperkube[2159]: I1015 19:45:12.122758    2159 setters.go:323] Update capacity for openshift.io/enp5s0f1NetdevAkaris2 to 3
(...)
~~~

Each of the 2 resources gets the *total* capacity 2 + 1. That's obviously wrong. The VFs though get correctly set per interface (see below).

The pod logs for the sriov-device-plugin show the same:
~~~
oc logs -n openshift-sriov-network-operator -l app=sriov-device-plugin
(...)
I1015 19:45:06.073601      19 server.go:105] Plugin: enp5s0f1NetdevAkaris2.sock gets registered successfully at Kubelet
I1015 19:45:06.073702      19 server.go:105] Plugin: enp5s0f0NetdevAkaris.sock gets registered successfully at Kubelet
I1015 19:45:06.073772      19 server.go:130] ListAndWatch(enp5s0f0NetdevAkaris) invoked
I1015 19:45:06.073781      19 server.go:138] ListAndWatch(enp5s0f0NetdevAkaris): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:05:10.2,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:05:10.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:05:10.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},}
I1015 19:45:06.073887      19 server.go:130] ListAndWatch(enp5s0f1NetdevAkaris2) invoked
I1015 19:45:06.074085      19 server.go:138] ListAndWatch(enp5s0f1NetdevAkaris2): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:05:10.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:05:10.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:05:10.2,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},}
[root@openshift-jumpserver-0 ~]# 
~~~

~~~
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-0.example.com -o yaml | grep Akaris -B15 | grep enp5s0f[01]NetdevAkaris
    openshift.io/enp5s0f0NetdevAkaris: "3"
    openshift.io/enp5s0f0NetdevAkaris2: "0"
    openshift.io/enp5s0f1NetdevAkaris2: "3"
    openshift.io/enp5s0f0NetdevAkaris: "3"
    openshift.io/enp5s0f0NetdevAkaris2: "0"
    openshift.io/enp5s0f1NetdevAkaris2: "3"
~~~

And when creating pods, we can see that they will land on VFs belonging to either PF, although they should only land on enp5s0f0:
~~~
[root@openshift-jumpserver-0 ~]# cat sriovnetwork-enp5s0f0.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriov-net-enp5s0f0-netdev-akaris
  namespace: openshift-sriov-network-operator 
spec:
  networkNamespace: sriov-testing
  ipam: '{ "type": "static" }'
  vlan: 905
  resourceName: enp5s0f0NetdevAkaris
  trust: "on" 
  capabilities: '{ "mac": true, "ips": true }'
[root@openshift-jumpserver-0 ~]# oc project
Using project "sriov-testing" on server "https://api.cluster.example.com:6443".
[root@openshift-jumpserver-0 ~]# oc get pods
No resources found in sriov-testing namespace.
[root@openshift-jumpserver-0 ~]# oc create -f sriovnetwork-enp5s0f0.yaml
sriovnetwork.sriovnetwork.openshift.io/sriov-net-enp5s0f0-netdev-akaris created
~~~

~~~
[root@openshift-jumpserver-0 ~]# cat sriovpoda.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sriovpoda
  namespace: sriov-testing
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
	{
		"name": "sriov-net-enp5s0f0-netdev-akaris", 
		"ips": ["192.168.10.10/24", "2001::10/64"] 
	}
]'
spec:
  containers:
  - name: sample-container
    image: centos:8
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]
[root@openshift-jumpserver-0 ~]# oc create -f sriovpoda.yaml
pod/sriovpoda created
~~~

~~~
[root@openshift-jumpserver-0 ~]# cat sriovpodb.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: sriovpodb
  namespace: sriov-testing
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
	{
		"name": "sriov-net-enp5s0f0-netdev-akaris", 
		"ips": ["192.168.10.11/24", "2001::11/64"] 
	}
]'
spec:
  containers:
  - name: sample-container
    image: centos:8
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]
[root@openshift-jumpserver-0 ~]# oc create -f sriovpodb.yaml
pod/sriovpodb created
~~~

We can already see here by the VLAN assignments that the pods are split between interfaces, although they should be on the same interface enp5s0f0:
~~~
[root@openshift-worker-0 ~]# ip link ls dev enp5s0f0
8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:e5:e2:a8 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 9e:f0:0f:cc:6c:01 brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
    vf 1     link/ether 3e:b6:ef:de:1b:aa brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
[root@openshift-worker-0 ~]# ip link ls dev enp5s0f1
9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:e5:e2:aa brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 2e:02:25:e4:11:b5 brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
~~~

Just to be sure, checking with the PCI bus addresses, as well:
~~~
[root@openshift-jumpserver-0 ~]# oc rsh sriovpoda
sh-4.4# env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEVAKARIS=0000:05:10.1
sh-4.4# exit
exit
[root@openshift-jumpserver-0 ~]# oc rsh sriovpodb
sh-4.4# env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEVAKARIS=0000:05:10.0
~~~

And from the VF->PF mappings, we see that podb uses VF0 on interface 0000:05:00.0  and podb uses VF0 on interface 0000:05:00.1
~~~
[root@openshift-worker-0 ~]#  find /sys -name '*virtfn*'  | while read f ; do echo === $f === ; ls -al $f ; done
=== /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.1/virtfn0 ===
lrwxrwxrwx. 1 root root 0 Oct 15 19:44 /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.1/virtfn0 -> ../0000:05:10.1
=== /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/virtfn0 ===
lrwxrwxrwx. 1 root root 0 Oct 15 19:43 /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/virtfn0 -> ../0000:05:10.0
=== /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/virtfn1 ===
lrwxrwxrwx. 1 root root 0 Oct 15 19:43 /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/virtfn1 -> ../0000:05:10.2
~~~

Now, looking at the upstream issues, it seems that this here was merged very recently (middle of September):
https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/264

I'm wondering if this fixes what I'm seeing. Looking at the doc file and the overall code change, I'm pretty confident that this upstream change actually solves this very issue:
https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/264/commits/53e1ffc19d3580d7565f7189511f3eb147eab978

On the other hand, I'm wondering about  ..
https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/blob/master/pkg/accelerator/accelDeviceProvider.go#L84

GetFilteredDevices filters by:
Vendors, Devices, Drivers
But not by rootDevices.

But I'm not at all sure about the logic and so this might not be part of the logic for the filter in question.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Andreas Karis 2020-10-15 20:38:28 UTC
The workaround for me is to use pfNames:
~~~
[root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-enp5s0f0-akaris
  namespace: openshift-sriov-network-operator
spec:
  resourceName: enp5s0f0NetdevAkaris
  priority: 99
  numVfs: 2
  nicSelector:
    pfNames: ["enp5s0f0"]
#    deviceID: "154d"
#    rootDevices:
# - '0000:05:00.0'
    vendor: "8086"
  deviceType: "netdevice"
  isRdma: false
  nodeSelector:
    kubernetes.io/hostname: openshift-worker-0.example.com
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-enp5s0f1-akaris2
  namespace: openshift-sriov-network-operator
spec:
  resourceName: enp5s0f1NetdevAkaris2
  priority: 99
  numVfs: 1
  nicSelector:
    pfNames: ["enp5s0f1"]
#    deviceID: '154d'
#    rootDevices:
#      - '0000:05:00.1'
    vendor: '8086'
  deviceType: "netdevice"
  isRdma: false
  nodeSelector:
    kubernetes.io/hostname: openshift-worker-0.example.com
~~~

~~~
Oct 15 20:24:23 openshift-worker-0.example.com hyperkube[2159]: I1015 20:24:23.528514    2159 setters.go:323] Update capacity for openshift.io/enp5s0f0NetdevAkaris to 2
(...)
Oct 15 20:25:15 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:15.604639    2159 endpoint.go:111] State pushed for device plugin openshift.io/enp5s0f1NetdevAkaris2
Oct 15 20:25:15 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:15.604769    2159 endpoint.go:111] State pushed for device plugin openshift.io/enp5s0f0NetdevAkaris
Oct 15 20:25:23 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:23.591637    2159 setters.go:323] Update capacity for openshift.io/enp5s0f1NetdevAkaris2 to 1
Oct 15 20:25:23 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:23.591637    2159 setters.go:323] Update capacity for openshift.io/enp5s0f1NetdevAkaris2 to 1
~~~

~~~
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-0.example.com -o yaml | grep Akaris -B15 | grep enp5s0f[01]NetdevAkaris
    openshift.io/enp5s0f0NetdevAkaris: "2"
    openshift.io/enp5s0f0NetdevAkaris2: "0"
    openshift.io/enp5s0f1NetdevAkaris2: "1"
    openshift.io/enp5s0f0NetdevAkaris: "2"
    openshift.io/enp5s0f0NetdevAkaris2: "0"
    openshift.io/enp5s0f1NetdevAkaris2: "1"
~~~

Comment 2 Andreas Karis 2020-10-15 20:43:05 UTC
Zenghui Shi, seems that you implemented https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/264  ; can you confirm that this is the same issue, and that your upstream commit fixes this issue? Thanks :-)

Comment 3 Marc Methot 2020-10-16 16:32:01 UTC
The patch has been pulled today (5 hours ago) for 4.7:
- https://github.com/openshift/sriov-network-device-plugin/commit/fb8a4320c8780d287b9b491190cd5c2d1626a5f6
- https://github.com/openshift/sriov-network-device-plugin/pull/32

The "latest" tag does not have the commits, however 4.7 does:
~~~
[mmethot@localhost sriov-network-operator]$ skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin | jq --raw-output '.Created'
2020-09-18T01:14:23.72057044Z
[mmethot@localhost sriov-network-operator]$ skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin:4.7 | jq --raw-output '.Created'
2020-10-16T09:58:09.12728526Z
~~~

I'll see if I can test it by just changing the SRIOV_DEVICE_PLUGIN_IMAGE to the 4.7 version

Comment 4 Andreas Karis 2020-10-17 14:26:38 UTC
Inspecting the pod and its container image:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod -n openshift-sriov-network-operator sriov-device-plugin-csb4s | grep -i image
    Image:         registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:b6dda4c1617da04d9ae9a8526e61373e3a7de67e8827dff171a6ca36f2a624b5
    Image ID:      registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:b6dda4c1617da04d9ae9a8526e61373e3a7de67e8827dff171a6ca36f2a624b5
  Normal  Pulled     57m   kubelet, openshift-worker-0.example.com  Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:b6dda4c1617da04d9ae9a8526e61373e3a7de67e8827dff171a6ca36f2a624b5" already present on machine
~~~

Inspecting the tags available for that image:
~~~
[root@openshift-jumpserver-0 ~]# skopeo list-tags docker://registry.redhat.io/openshift4/ose-sriov-network-device-plugin

{
    "Repository": "registry.redhat.io/openshift4/ose-sriov-network-device-plugin",
    "Tags": [
        "v4.3.26-202006160135",
        "v4.3.14",
        "v4.3.13",
        "v4.3.12",
        "v4.4.0-202007232108.p0",
        "v4.3.10",
        "v4.4.0-202007120152.p0",
        "v4.4.0",
        "v4.3.19",
        "v4.1.24",
        "v4.1.25",
        "v4.3.5",
        "v4.1.34-202002031224",
        "v4.3.3",
        "v4.1.21",
        "v4.3.1",
        "v4.3.0",
        "v4.1.0-201905191700",
        "v4.1.28",
        "v4.1.29",
        "v4.2.32-202005050921",
        "v4.2.32",
        "v4.3.28-202006290519.p0",
        "v4.3.7-202003161611",
        "v4.1.18-201909201915",
        "v4.3.19-202005041055",
        "v4.3.25-202006081335",
        "v4.4.0-202007171809.p0",
        "v4.3.3-202002171705",
        "v4.2.26",
        "v4.2.10-201912022352",
        "v4.1.16-201909100604",
        "v4.3.13-202004131016",
        "v4.2.9-201911261133",
        "v4.2.18-202002031246",
        "v4.1.10",
        "v4.1.13",
        "v4.1.15",
        "v4.1.14",
        "v4.1.17",
        "v4.1.16",
        "v4.1.18",
        "v4.4.0-202005121717",
        "v4.3.7",
        "v4.2.15-202001171551",
        "v4.1.30-202001061940",
        "v4.1.10-201908061216",
        "v4.5",
        "v4.1.26",
        "v4.2.24",
        "v4.2.27",
        "v4.1.27",
        "v4.2.21",
        "v4.2.20",
        "v4.2.23",
        "v4.2.22",
        "v4.2.21-202002240343",
        "v4.2.29",
        "v4.2.28",
        "v4.4.0-202005180840",
        "v4.3.2",
        "v4.3.31",
        "v4.5.0",
        "v4.3.33",
        "v4.3.35",
        "v4.1.22",
        "v4.3.37",
        "v4.2.11-201912100122",
        "v4.1.23",
        "v4.5.0-202008210149.p0",
        "v4.2.5-201911121709",
        "v4.3.28",
        "v4.1.4",
        "v4.1.7",
        "v4.1.6",
        "v4.1.1",
        "v4.1.0",
        "v4.1.3",
        "v4.1.9-201907311355",
        "v4.1.9",
        "v4.1.8",
        "v4.5.0-202007131801.p0",
        "v4.1.17-201909171057",
        "v4.1",
        "v4.2",
        "v4.3",
        "v4.4",
        "v4.3.9",
        "v4.2.34",
        "v4.2.34-202005252115",
        "v4.1.41-202004130646",
        "v4.1.7-201907171753",
        "v4.1.6-201907101224",
        "v4.4.0-202006160135",
        "v4.1.21-201910230924",
        "v4.3.29",
        "v4.3.26",
        "v4.3.27",
        "v4.3.25",
        "v4.3.22",
        "v4.3.23",
        "v4.3.20",
        "v4.2.8-201911190952",
        "v4.3.2-202002112006",
        "v4.2.22-202003020552",
        "v4.3.12-202004091734",
        "v4.1.26-201911260202",
        "v4.3.10-202003311428",
        "v4.2.23-202003090920",
        "v4.4.0-202007060343.p0",
        "v4.1.13-201908210601",
        "v4.2.24-202003161048",
        "v4.2.20-202002171604",
        "v4.1.24-201911120311",
        "v4.1.3-201906181537",
        "v4.5.0-202007172106.p0",
        "v4.2.14-202001061701",
        "v4.2.27-202003301126",
        "v4.4.0-202006211643.p0",
        "v4.3.23-202005250821",
        "v4.1.29-201912230303",
        "v4.3.2-202002070552",
        "v4.2.1-201910221723",
        "v4.3.35-202008311640.p0",
        "v4.3.1-202002032140",
        "v4.3.29-202007061006.p0",
        "v4.1.37-202003021622",
        "v4.5.0-202007240519.p0",
        "v4.4.0-202006080610",
        "v4.1.4-201906271212",
        "v4.1.14-201908291507",
        "v4.1.22-201910291109",
        "v4.3.5-202003020549",
        "v4.5.0-202008100413.p0",
        "v4.5.0-202009161248.p0",
        "v4.1.23-201911050122",
        "v4.3.37-202009151447.p0",
        "v4.4.0-202004261927",
        "v4.5.0-202009041228.p0",
        "v4.3.22-202005201238",
        "v4.3.31-202007272153.p0",
        "v4.2.36-202006230600.p0",
        "v4.4.0-202009041255.p0",
        "v4.2.0-201910101614",
        "v4.2.0",
        "v4.3.0-202001211731",
        "v4.2.26-202003230335",
        "v4.3.33-202008111029.p0",
        "v4.2.29-202004140532",
        "v4.2.13-201912230557",
        "v4.1.41",
        "v4.4.0-202009161309.p0",
        "v4.3.14-202004200457",
        "v4.1.20",
        "v4.2.19-202002101212",
        "v4.2.1",
        "v4.2.4",
        "v4.2.5",
        "v4.2.8",
        "v4.1.31-202001140447",
        "v4.3.20-202005121847",
        "v4.2.9",
        "v4.1.25-201911190028",
        "v4.3.9-202003230345",
        "v4.3.27-202006211650.p0",
        "v4.4.0-202008210157.p0",
        "v4.1.28-201912110241",
        "v4.4.0-202008100806.p0",
        "v4.2.18",
        "v4.2.19",
        "v4.5.0-202007281519.p0",
        "v4.2.10",
        "v4.2.11",
        "v4.2.13",
        "v4.2.14",
        "v4.2.15",
        "v4.1.37",
        "v4.1.27-201912030019",
        "v4.1.34",
        "v4.5.0-202007012112.p0",
        "v4.1.31",
        "v4.1.30",
        "v4.2.4-201911050122",
        "v4.1.15-201909041605",
        "v4.2.36",
        "v4.4.0-202005252114",
        "v4.1.1-201906040019",
        "v4.2.28-202004061218",
        "v4.1.8-201907241243",
        "v4.4.0-202007291438.p0",
        "v4.4.0-202006290400.p0",
        "latest",
        "v4.1.20-201910102034"
    ]
}
~~~

So, we use this repository here:
https://github.com/openshift/ose-sriov-network-device-plugin (archived) -> https://github.com/openshift/sriov-network-device-plugin

That can also be found by inspecting the image itself:
~~~
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://registry.redhat.io/openshift4/ose-sriov-network-device-plugin 
{
    "Name": "registry.redhat.io/openshift4/ose-sriov-network-device-plugin",
    "Digest": "sha256:a7770ca0e86fa4fe95733eb8e4711e829f724e3b4ffbf8e0ccafdcac275c0372",
    "RepoTags": [
        "v4.3.26-202006160135",
        "v4.3.14",
        "v4.3.13",
        "v4.3.12",
        "v4.4.0-202007232108.p0",
        "v4.3.10",
        "v4.4.0-202007120152.p0",
        "v4.4.0",
        "v4.3.19",
        "v4.1.24",
        "v4.1.25",
        "v4.3.5",
        "v4.1.34-202002031224",
        "v4.3.3",
        "v4.1.21",
        "v4.3.1",
        "v4.3.0",
        "v4.1.0-201905191700",
        "v4.1.28",
        "v4.1.29",
        "v4.2.32-202005050921",
        "v4.2.32",
        "v4.3.28-202006290519.p0",
        "v4.3.7-202003161611",
        "v4.1.18-201909201915",
        "v4.3.19-202005041055",
        "v4.3.25-202006081335",
        "v4.4.0-202007171809.p0",
        "v4.3.3-202002171705",
        "v4.2.26",
        "v4.2.10-201912022352",
        "v4.1.16-201909100604",
        "v4.3.13-202004131016",
        "v4.2.9-201911261133",
        "v4.2.18-202002031246",
        "v4.1.10",
        "v4.1.13",
        "v4.1.15",
        "v4.1.14",
        "v4.1.17",
        "v4.1.16",
        "v4.1.18",
        "v4.4.0-202005121717",
        "v4.3.7",
        "v4.2.15-202001171551",
        "v4.1.30-202001061940",
        "v4.1.10-201908061216",
        "v4.5",
        "v4.1.26",
        "v4.2.24",
        "v4.2.27",
        "v4.1.27",
        "v4.2.21",
        "v4.2.20",
        "v4.2.23",
        "v4.2.22",
        "v4.2.21-202002240343",
        "v4.2.29",
        "v4.2.28",
        "v4.4.0-202005180840",
        "v4.3.2",
        "v4.3.31",
        "v4.5.0",
        "v4.3.33",
        "v4.3.35",
        "v4.1.22",
        "v4.3.37",
        "v4.2.11-201912100122",
        "v4.1.23",
        "v4.5.0-202008210149.p0",
        "v4.2.5-201911121709",
        "v4.3.28",
        "v4.1.4",
        "v4.1.7",
        "v4.1.6",
        "v4.1.1",
        "v4.1.0",
        "v4.1.3",
        "v4.1.9-201907311355",
        "v4.1.9",
        "v4.1.8",
        "v4.5.0-202007131801.p0",
        "v4.1.17-201909171057",
        "v4.1",
        "v4.2",
        "v4.3",
        "v4.4",
        "v4.3.9",
        "v4.2.34",
        "v4.2.34-202005252115",
        "v4.1.41-202004130646",
        "v4.1.7-201907171753",
        "v4.1.6-201907101224",
        "v4.4.0-202006160135",
        "v4.1.21-201910230924",
        "v4.3.29",
        "v4.3.26",
        "v4.3.27",
        "v4.3.25",
        "v4.3.22",
        "v4.3.23",
        "v4.3.20",
        "v4.2.8-201911190952",
        "v4.3.2-202002112006",
        "v4.2.22-202003020552",
        "v4.3.12-202004091734",
        "v4.1.26-201911260202",
        "v4.3.10-202003311428",
        "v4.2.23-202003090920",
        "v4.4.0-202007060343.p0",
        "v4.1.13-201908210601",
        "v4.2.24-202003161048",
        "v4.2.20-202002171604",
        "v4.1.24-201911120311",
        "v4.1.3-201906181537",
        "v4.5.0-202007172106.p0",
        "v4.2.14-202001061701",
        "v4.2.27-202003301126",
        "v4.4.0-202006211643.p0",
        "v4.3.23-202005250821",
        "v4.1.29-201912230303",
        "v4.3.2-202002070552",
        "v4.2.1-201910221723",
        "v4.3.35-202008311640.p0",
        "v4.3.1-202002032140",
        "v4.3.29-202007061006.p0",
        "v4.1.37-202003021622",
        "v4.5.0-202007240519.p0",
        "v4.4.0-202006080610",
        "v4.1.4-201906271212",
        "v4.1.14-201908291507",
        "v4.1.22-201910291109",
        "v4.3.5-202003020549",
        "v4.5.0-202008100413.p0",
        "v4.5.0-202009161248.p0",
        "v4.1.23-201911050122",
        "v4.3.37-202009151447.p0",
        "v4.4.0-202004261927",
        "v4.5.0-202009041228.p0",
        "v4.3.22-202005201238",
        "v4.3.31-202007272153.p0",
        "v4.2.36-202006230600.p0",
        "v4.4.0-202009041255.p0",
        "v4.2.0-201910101614",
        "v4.2.0",
        "v4.3.0-202001211731",
        "v4.2.26-202003230335",
        "v4.3.33-202008111029.p0",
        "v4.2.29-202004140532",
        "v4.2.13-201912230557",
        "v4.1.41",
        "v4.4.0-202009161309.p0",
        "v4.3.14-202004200457",
        "v4.1.20",
        "v4.2.19-202002101212",
        "v4.2.1",
        "v4.2.4",
        "v4.2.5",
        "v4.2.8",
        "v4.1.31-202001140447",
        "v4.3.20-202005121847",
        "v4.2.9",
        "v4.1.25-201911190028",
        "v4.3.9-202003230345",
        "v4.3.27-202006211650.p0",
        "v4.4.0-202008210157.p0",
        "v4.1.28-201912110241",
        "v4.4.0-202008100806.p0",
        "v4.2.18",
        "v4.2.19",
        "v4.5.0-202007281519.p0",
        "v4.2.10",
        "v4.2.11",
        "v4.2.13",
        "v4.2.14",
        "v4.2.15",
        "v4.1.37",
        "v4.1.27-201912030019",
        "v4.1.34",
        "v4.5.0-202007012112.p0",
        "v4.1.31",
        "v4.1.30",
        "v4.2.4-201911050122",
        "v4.1.15-201909041605",
        "v4.2.36",
        "v4.4.0-202005252114",
        "v4.1.1-201906040019",
        "v4.2.28-202004061218",
        "v4.1.8-201907241243",
        "v4.4.0-202007291438.p0",
        "v4.4.0-202006290400.p0",
        "latest",
        "v4.1.20-201910102034"
    ],
    "Created": "2020-09-16T16:22:14.454607923Z",
    "DockerVersion": "1.13.1",
    "Labels": {
        "License": "GPLv2+",
        "architecture": "x86_64",
        "build-date": "2020-09-16T16:20:45.754414",
        "com.redhat.build-host": "cpt-1008.osbs.prod.upshift.rdu2.redhat.com",
        "com.redhat.component": "sriov-network-device-plugin-container",
        "com.redhat.license_terms": "https://www.redhat.com/agreements",
        "description": "This is the base image from which all OpenShift Container Platform images inherit.",
        "distribution-scope": "public",
        "io.k8s.description": "This is the base image from which all OpenShift Container Platform images inherit.",
        "io.k8s.display-name": "SRIOV Network Device Plugin",
        "io.openshift.build.commit.id": "2a152e3b4e02b2968e8f51acc9af454269983105",
        "io.openshift.build.commit.url": "https://github.com/openshift/sriov-network-device-plugin/commit/2a152e3b4e02b2968e8f51acc9af454269983105",
        "io.openshift.build.source-location": "https://github.com/openshift/sriov-network-device-plugin",
        "io.openshift.maintainer.component": "Networking",
        "io.openshift.maintainer.product": "OpenShift Container Platform",
        "io.openshift.maintainer.subcomponent": "SR-IOV",
        "io.openshift.tags": "openshift,base",
        "name": "openshift/ose-sriov-network-device-plugin",
        "release": "202009161248.p0",
        "summary": "Provides the latest release of the Red Hat Universal Base Image 7.",
        "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-sriov-network-device-plugin/images/v4.5.0-202009161248.p0",
        "vcs-ref": "6e2ac1ce55d4c68788df79f1da4cfd55bcf279be",
        "vcs-type": "git",
        "vendor": "Red Hat, Inc.",
        "version": "v4.5.0"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:c9fa7d57b9028d4bd02b51cef3c3039fa7b23a8b2d9d26a6ce66b3428f6e2457",
        "sha256:74cbb6607642df5f9f70e8588e3c56d6de795d1a9af22866ea4cc82f2dad4f14",
        "sha256:342d2c032054c3651de9a26b47df275eb472a0e268d7b533ff14f4aab2c029e1",
        "sha256:8da07861177a6e84747ec6309336839e9be6b526931dcf967b5ed815debf00f5",
        "sha256:b5a21cdc47542e59df1dbd56326f34377ca821aa120ba3ee7e9aa44ed1ecc9fd"
    ],
    "Env": [
        "__doozer=merge",
        "BUILD_RELEASE=202009161248.p0",
        "BUILD_VERSION=v4.5.0",
        "OS_GIT_MAJOR=4",
        "OS_GIT_MINOR=5",
        "OS_GIT_PATCH=0",
        "OS_GIT_TREE_STATE=clean",
        "OS_GIT_VERSION=4.5.0-202009161248.p0-2a152e3",
        "SOURCE_GIT_TREE_STATE=clean",
        "OS_GIT_COMMIT=2a152e3",
        "SOURCE_DATE_EPOCH=1586862169",
        "SOURCE_GIT_COMMIT=2a152e3b4e02b2968e8f51acc9af454269983105",
        "SOURCE_GIT_TAG=2a152e3b",
        "SOURCE_GIT_URL=https://github.com/openshift/sriov-network-device-plugin",
        "INSTALL_PKGS=hwdata",
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "container=oci"
    ]
}
~~~

I don't know how Marc made the connection to quay.io, but indeed this image and tag here have the fix:
~~~
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin:latest
{
    "Name": "quay.io/openshift/origin-sriov-network-device-plugin",
    "Digest": "sha256:d4f78377a57b30e94c27cd5fe1f4d0a404fa2980c3224606ac58bdd2e29e60f3",
    "RepoTags": [
        "v4.0",
        "v4.0.0",
        "4.1",
        "4.1.0",
        "4.2",
        "4.2.0",
        "4.3",
        "4.3.0",
        "4.4",
        "4.4.0",
        "4.5",
        "4.5.0",
        "4.6",
        "4.6.0",
        "latest",
        "4.7",
        "4.7.0"
    ],
    "Created": "2020-09-18T01:14:23.72057044Z",
    "DockerVersion": "1.13.1",
    "Labels": {
        "architecture": "x86_64",
        "build-date": "2020-09-05T01:13:15.933978",
        "com.redhat.build-host": "cpt-1003.osbs.prod.upshift.rdu2.redhat.com",
        "com.redhat.component": "openshift-enterprise-base-container",
        "com.redhat.license_terms": "https://www.redhat.com/agreements",
        "description": "The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
        "distribution-scope": "public",
        "io.k8s.description": "This is the base image from which all OpenShift images inherit.",
        "io.k8s.display-name": "SRIOV Network Device Plugin",
        "io.openshift.build.commit.author": "",
        "io.openshift.build.commit.date": "",
        "io.openshift.build.commit.id": "8cc3e960f7f1831eaec2baa9b4fb52f3232f6e39",
        "io.openshift.build.commit.message": "",
        "io.openshift.build.commit.ref": "master",
        "io.openshift.build.name": "",
        "io.openshift.build.namespace": "",
        "io.openshift.build.source-context-dir": "",
        "io.openshift.build.source-location": "https://github.com/openshift/sriov-network-device-plugin",
        "io.openshift.expose-services": "",
        "io.openshift.tags": "base rhel8",
        "maintainer": "Red Hat, Inc.",
        "name": "openshift/ose-base",
        "release": "202009050041.5133",
        "summary": "Provides the latest release of Red Hat Universal Base Image 8.",
        "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.0-202009050041.5133",
        "vcs-ref": "8cc3e960f7f1831eaec2baa9b4fb52f3232f6e39",
        "vcs-type": "git",
        "vcs-url": "https://github.com/openshift/sriov-network-device-plugin",
        "vendor": "Red Hat, Inc.",
        "version": "v4.0"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:77c58f19bd6e67185938abb6bbb6ec229e07a5e607453904294d982de141d2f0",
        "sha256:47db82df7f3f4393c1f19c362a2db2c47ca049b6fb20bef041dfc9bdb12a4504",
        "sha256:9e518d0279d9c75999daa4b867d4924c26b32617223a7ff068027268dcdbf4e4",
        "sha256:c1c8ae64753f9b716109aba76c59de8d11a6f232f27aeac2ecfa4cb926e3ff70",
        "sha256:dc1710d2036e1a862c75c63a2cdb0bfcf8e1c691bbbae89fcd9f5b87c694991c",
        "sha256:42d548b84ccaf0226ae09a2012647c9eb22c29c65a538fba7d287b1e3da34535"
    ],
    "Env": [
        "foo=bar",
        "INSTALL_PKGS=hwdata",
        "OPENSHIFT_BUILD_NAME=sriov-network-device-plugin",
        "OPENSHIFT_BUILD_NAMESPACE=ci-op-ikhv9i70",
        "OPENSHIFT_CI=true",
        "OPENSHIFT_BUILD_SOURCE=https://github.com/openshift/release.git",
        "OPENSHIFT_BUILD_REFERENCE=master",
        "OPENSHIFT_BUILD_COMMIT=6bd8b00435ece6787878009f4f16cba809f9f930",
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "container=oci"
    ]
}
[root@openshift-jumpserver-0 ~]# 
[root@openshift-jumpserver-0 ~]# 
[root@openshift-jumpserver-0 ~]# 
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin:4.7
{
    "Name": "quay.io/openshift/origin-sriov-network-device-plugin",
    "Digest": "sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0",
    "RepoTags": [
        "v4.0",
        "v4.0.0",
        "4.2",
        "4.2.0",
        "4.3",
        "4.3.0",
        "4.4",
        "4.4.0",
        "4.5",
        "4.5.0",
        "4.6",
        "4.6.0",
        "latest",
        "4.7",
        "4.7.0",
        "4.1",
        "4.1.0"
    ],
    "Created": "2020-10-16T09:58:09.12728526Z",
    "DockerVersion": "1.13.1",
    "Labels": {
        "architecture": "x86_64",
        "build-date": "2020-09-05T01:13:15.933978",
        "com.redhat.build-host": "cpt-1003.osbs.prod.upshift.rdu2.redhat.com",
        "com.redhat.component": "openshift-enterprise-base-container",
        "com.redhat.license_terms": "https://www.redhat.com/agreements",
        "description": "The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
        "distribution-scope": "public",
        "io.k8s.description": "This is the base image from which all OpenShift images inherit.",
        "io.k8s.display-name": "SRIOV Network Device Plugin",
        "io.openshift.build.commit.author": "",
        "io.openshift.build.commit.date": "",
        "io.openshift.build.commit.id": "fb8a4320c8780d287b9b491190cd5c2d1626a5f6",
        "io.openshift.build.commit.message": "",
        "io.openshift.build.commit.ref": "master",
        "io.openshift.build.name": "",
        "io.openshift.build.namespace": "",
        "io.openshift.build.source-context-dir": "",
        "io.openshift.build.source-location": "https://github.com/openshift/sriov-network-device-plugin",
        "io.openshift.expose-services": "",
        "io.openshift.tags": "base rhel8",
        "maintainer": "Red Hat, Inc.",
        "name": "openshift/ose-base",
        "release": "202009050041.5133",
        "summary": "Provides the latest release of Red Hat Universal Base Image 8.",
        "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.0-202009050041.5133",
        "vcs-ref": "fb8a4320c8780d287b9b491190cd5c2d1626a5f6",
        "vcs-type": "git",
        "vcs-url": "https://github.com/openshift/sriov-network-device-plugin",
        "vendor": "Red Hat, Inc.",
        "version": "v4.0"
    },
    "Architecture": "amd64",
    "Os": "linux",
    "Layers": [
        "sha256:77c58f19bd6e67185938abb6bbb6ec229e07a5e607453904294d982de141d2f0",
        "sha256:47db82df7f3f4393c1f19c362a2db2c47ca049b6fb20bef041dfc9bdb12a4504",
        "sha256:9e518d0279d9c75999daa4b867d4924c26b32617223a7ff068027268dcdbf4e4",
        "sha256:b4070df4e3ccb68562aa89b31f2d5b6b0a035c2f227770454d66510569db965c",
        "sha256:76ae5f5edcd237b0beafdde94430aefa4964cd2763f901c8928e2aaa70139fb9",
        "sha256:cff1622ad4836a42339c32ee1978904c647659aaf039a5f84a61e4bcd2fa7eb7"
    ],
    "Env": [
        "foo=bar",
        "INSTALL_PKGS=hwdata",
        "OPENSHIFT_BUILD_NAME=sriov-network-device-plugin",
        "OPENSHIFT_BUILD_NAMESPACE=ci-op-fgdb8chi",
        "GODEBUG=x509ignoreCN=0",
        "OPENSHIFT_CI=true",
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "container=oci"
    ]
}
~~~

So, that image pulls from commit:
https://github.com/openshift/sriov-network-device-plugin/commit/fb8a4320c8780d287b9b491190cd5c2d1626a5f6

Comment 5 Andreas Karis 2020-10-17 14:26:46 UTC
For verfification, we can patch the CSV:
~~~
[root@openshift-jumpserver-0 ~]# oc  get csv -n openshift-sriov-network-operator 
NAME                                           DISPLAY                   VERSION                 REPLACES   PHASE
sriov-network-operator.4.4.0-202009281328.p0   SR-IOV Network Operator   4.4.0-202009281328.p0              Succeeded
~~~

~~~
oc edit csv -n openshift-sriov-network-operator sriov-network-operator.4.4.0-202009281328.p0
~~~

And search for SRIOV_DEVICE_PLUGIN_IMAGE. Change the value that belongs to that index to: "quay.io/openshift/origin-sriov-network-device-plugin:4.7"

The following verification command should yield:
~~~
[root@openshift-jumpserver-0 ~]# oc  get csv -n openshift-sriov-network-operator sriov-network-operator.4.4.0-202009281328.p0  -o json | jq '.spec.install.spec.deployments[] | select(.name == "sriov-network-operator") | .spec.template.spec.containers[] | select(.name == "sriov-network-operator").env[] | select(.name =="SRIOV_DEVICE_PLUGIN_IMAGE")' 
{
  "name": "SRIOV_DEVICE_PLUGIN_IMAGE",
  "value": "quay.io/openshift/origin-sriov-network-device-plugin:4.7"
}
~~~

Now, delete the sriov-network-operator deployment:
~~~
oc delete deployment -n openshift-sriov-network-operator sriov-network-operator
~~~

Watch the CSV's log messages to make sure that it picks up the change:
~~~
[root@openshift-jumpserver-0 ~]# oc describe csv -n openshift-sriov-network-operator | tail -n 15
  Type     Reason               Age                  From                        Message
  ----     ------               ----                 ----                        -------
  Normal   RequirementsUnknown  167m                 operator-lifecycle-manager  requirements not yet checked
  Normal   RequirementsNotMet   167m (x2 over 167m)  operator-lifecycle-manager  one or more requirements couldn't be found
  Normal   InstallWaiting       167m                 operator-lifecycle-manager  installing: waiting for deployment sriov-network-operator to become ready: Waiting for deployment spec update to be observed...
  Warning  ComponentUnhealthy   31m                  operator-lifecycle-manager  installing: waiting for deployment sriov-network-operator to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
  Warning  ComponentUnhealthy   21m (x3 over 24m)    operator-lifecycle-manager  installing: missing deployment with name=sriov-network-operator
  Normal   NeedsReinstall       21m (x2 over 24m)    operator-lifecycle-manager  installing: missing deployment with name=sriov-network-operator
  Normal   AllRequirementsMet   21m (x5 over 167m)   operator-lifecycle-manager  all requirements found, attempting install
  Normal   InstallSucceeded     21m (x8 over 167m)   operator-lifecycle-manager  waiting for install components to report healthy
  Normal   InstallWaiting       21m (x5 over 167m)   operator-lifecycle-manager  installing: waiting for deployment sriov-network-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
  Normal   InstallSucceeded     21m (x6 over 167m)   operator-lifecycle-manager  install strategy completed with no errors
  Normal   NeedsReinstall       20m (x3 over 31m)    operator-lifecycle-manager  installing: waiting for deployment sriov-network-operator to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
  Warning  ComponentUnhealthy   20m (x2 over 31m)    operator-lifecycle-manager  installing: waiting for deployment sriov-network-operator to become ready: Waiting for deployment spec update to be observed...
  Normal   ComponentUnhealthy   1m                  operator-lifecycle-manager  installing: deployment changed old hash=7b88b64698, new hash=7759748fbb
~~~

The new sriov-network-operator should reflect the change in image name:
~~~
[root@openshift-jumpserver-0 ~]# oc get deployment -n openshift-sriov-network-operator  sriov-network-operator -o yaml | grep -i SRIOV_DEVICE_PLUGIN_IMAGE -A1
        - name: SRIOV_DEVICE_PLUGIN_IMAGE
          value: quay.io/openshift/origin-sriov-network-device-plugin:4.7
~~~

This should cascade down into the sriov-device-plugin daemonset:
~~~
[root@openshift-jumpserver-0 ~]# oc get daemonset -n openshift-sriov-network-operator sriov-device-plugin -o yaml | grep -i image
        image: quay.io/openshift/origin-sriov-network-device-plugin:4.7
        imagePullPolicy: IfNotPresent
~~~

And new pods should spawn:
~~~
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-sriov-network-operator -o wide --show-labels -l app=sriov-device-plugin
NAME                        READY   STATUS    RESTARTS   AGE   IP                NODE                             NOMINATED NODE   READINESS GATES   LABELS
sriov-device-plugin-7jckr   1/1     Running   0          1m   192.168.123.220   openshift-worker-0.example.com   <none>           <none>            app=sriov-device-plugin,component=network,controller-revision-hash=6ccf8b4d67,openshift.io/component=network,pod-template-generation=17,type=infra
~~~

~~~
[root@openshift-jumpserver-0 ~]# oc describe pods -n openshift-sriov-network-operator -l app=sriov-device-plugin | grep -i image
    Image:         quay.io/openshift/origin-sriov-network-device-plugin:4.7
    Image ID:      quay.io/openshift/origin-sriov-network-device-plugin@sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0
  Normal  Pulled     20m   kubelet, openshift-worker-0.example.com  Container image "quay.io/openshift/origin-sriov-network-device-plugin:4.7" already present on machine
~~~

~~~
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin@sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0 | grep -i commit
        "io.openshift.build.commit.author": "",
        "io.openshift.build.commit.date": "",
        "io.openshift.build.commit.id": "fb8a4320c8780d287b9b491190cd5c2d1626a5f6",
        "io.openshift.build.commit.message": "",
        "io.openshift.build.commit.ref": "master",
~~~

And just as a reminder, that's the commit in question: https://github.com/openshift/sriov-network-device-plugin/commit/fb8a4320c8780d287b9b491190cd5c2d1626a5f6

Comment 7 Andreas Karis 2020-10-17 14:58:29 UTC
I'm testing with a slightly changed definition:
~~~
[root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-enp5s0f0-netdevice-1888828
  namespace: openshift-sriov-network-operator
spec:
  resourceName: enp5s0f0Netdev1888828
  nodeSelector:
    kubernetes.io/hostname: openshift-worker-2.example.com
  priority: 10
  mtu: 1500
  numVfs: 2
  nicSelector:
    vendor: "8086"
    rootDevices: ["0000:05:00.0"]
  deviceType: "netdevice"
  isRdma: false
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-enp5s0f1-netdevice-1888828
  namespace: openshift-sriov-network-operator
spec:
  resourceName: enp5s0f1Netdev1888828
  nodeSelector:
    kubernetes.io/hostname: openshift-worker-2.example.com
  priority: 10
  mtu: 1500
  numVfs: 1
  nicSelector:
    vendor: "8086"
    rootDevices: ["0000:05:00.1"]
  deviceType: "netdevice"
  isRdma: false
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-sriov-network-operator
NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-68plg          1/1     Running   0          3h32m
network-resources-injector-7vjq8          1/1     Running   0          3h32m
network-resources-injector-g5h5f          1/1     Running   0          3h32m
operator-webhook-bh2dd                    1/1     Running   0          3h32m
operator-webhook-gc55c                    1/1     Running   0          3h32m
operator-webhook-tzk5s                    1/1     Running   0          3h32m
sriov-network-config-daemon-hv8bg         1/1     Running   0          3h32m
sriov-network-config-daemon-k2d2c         1/1     Running   0          3h32m
sriov-network-config-daemon-sdhjh         1/1     Running   0          3h32m
sriov-network-operator-748778b679-hh76v   1/1     Running   0          59m
[root@openshift-jumpserver-0 ~]# oc apply -f networkpolicy-netdevice.yaml
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-enp5s0f0-netdevice-1888828 created
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-enp5s0f1-netdevice-1888828 created
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-2.example.com -o yaml | grep openshift.io
    machineconfiguration.openshift.io/currentConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
    machineconfiguration.openshift.io/desiredConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/state: Done
    node.openshift.io/os_id: rhcos
    openshift.io/enp5s0f0Netdev: "0"
    openshift.io/enp5s0f0Vfiopci: "0"
    openshift.io/enp5s0f1Netdev: "0"
    openshift.io/enp5s0f0Netdev: "0"
    openshift.io/enp5s0f0Vfiopci: "0"
    openshift.io/enp5s0f1Netdev: "0"
[root@openshift-jumpserver-0 ~]# sleep 300 ; oc get nodes openshift-worker-2.example.com -o yaml | grep openshift.io
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-2.example.com -o yaml | grep openshift.io
    machineconfiguration.openshift.io/currentConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
    machineconfiguration.openshift.io/desiredConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/state: Done
    node.openshift.io/os_id: rhcos
    openshift.io/enp5s0f0Netdev: "0"
    openshift.io/enp5s0f0Netdev1888828: "3"
    openshift.io/enp5s0f0Vfiopci: "0"
    openshift.io/enp5s0f1Netdev: "0"
    openshift.io/enp5s0f1Netdev1888828: "3"
    openshift.io/enp5s0f0Netdev: "0"
    openshift.io/enp5s0f0Netdev1888828: "3"
    openshift.io/enp5s0f0Vfiopci: "0"
    openshift.io/enp5s0f1Netdev: "0"
    openshift.io/enp5s0f1Netdev1888828: "3"
~~~

And spawning pods:
~~~
[root@openshift-jumpserver-0 ~]# cat sriovnetwork-enp5s0f0.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriov-net-enp5s0f0-netdev-1888828
  namespace: openshift-sriov-network-operator 
spec:
  networkNamespace: sriov-testing
  ipam: '{ "type": "static" }'
  vlan: 905
  resourceName: enp5s0f0Netdev1888828
  trust: "on" 
  capabilities: '{ "mac": true, "ips": true }'
[root@openshift-jumpserver-0 ~]# oc apply -f sriovnetwork-enp5s0f0.yaml
sriovnetwork.sriovnetwork.openshift.io/sriov-net-enp5s0f0-netdev-1888828 created
[root@openshift-jumpserver-0 ~]# oc get -f !$
oc get -f sriovnetwork-enp5s0f0.yaml
NAME                                AGE
sriov-net-enp5s0f0-netdev-1888828   13s
[root@openshift-jumpserver-0 ~]# oc apply -f sriovpod0.yaml
pod/sriovpod0 created
[root@openshift-jumpserver-0 ~]# oc apply -f sriovpod1.yaml
pod/sriovpod1 created
[root@openshift-jumpserver-0 ~]# oc apply -f sriovpod2.yaml
pod/sriovpod2 created
[root@openshift-jumpserver-0 ~]# cat sriovpod{0,1,2}.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sriovpod0
  namespace: sriov-testing
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-netdev-1888828", 
"ips": ["192.168.10.10/24", "2001::10/64"] 
}
]'
spec:
  containers:
  - name: sample-container
    image: centos:8
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
  name: sriovpod1
  namespace: sriov-testing
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-netdev-1888828", 
"ips": ["192.168.10.11/24", "2001::11/64"] 
}
]'
spec:
  containers:
  - name: sample-container
    image: centos:8
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
  name: sriovpod2
  namespace: sriov-testing
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-netdev-1888828", 
"ips": ["192.168.10.12/24", "2001::12/64"] 
}
]'
spec:
  containers:
  - name: sample-container
    image: centos:8
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]
~~~

~~~
[root@openshift-jumpserver-0 ~]# oc get pods -n sriov-testing
NAME        READY   STATUS    RESTARTS   AGE
sriovpod0   1/1     Running   0          32s
sriovpod1   1/1     Running   0          28s
sriovpod2   1/1     Running   0          24s
~~~

~~~
[root@openshift-jumpserver-0 ~]# oc exec -it -n sriov-testing sriovpod0 env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEV1888828=0000:05:10.2
[root@openshift-jumpserver-0 ~]# oc exec -it -n sriov-testing sriovpod1 env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEV1888828=0000:05:10.0
[root@openshift-jumpserver-0 ~]# oc exec -it -n sriov-testing sriovpod2 env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEV1888828=0000:05:10.1
[root@openshift-jumpserver-0 ~]# 
~~~

~~~
8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:e5:da:30 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 82:10:bc:d8:12:9d brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
    vf 1     link/ether e2:0e:13:52:80:34 brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
[root@openshift-worker-2 ~]# ip link ls dev enp5s0f1
9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether a0:36:9f:e5:da:32 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 22:f1:58:4a:4c:69 brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
~~~

So, even with the new image, the issue persists. So, either the aforementioned bug fix was not supposed to fix this, or there's a problem with that bug fix, or my test is flawed.

Comment 8 Andreas Karis 2020-10-17 15:01:32 UTC
[root@openshift-jumpserver-0 ~]# oc describe pods -n openshift-sriov-network-operator -l app=sriov-device-plugin | grep -i image
    Image:         quay.io/openshift/origin-sriov-network-device-plugin:4.7
    Image ID:      quay.io/openshift/origin-sriov-network-device-plugin@sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0
  Normal  Pulled     12m   kubelet, openshift-worker-2.example.com  Container image "quay.io/openshift/origin-sriov-network-device-plugin:4.7" already present on machine
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-sriov-network-operator -l app=sriov-device-plugin -o wide
NAME                        READY   STATUS    RESTARTS   AGE   IP                NODE                             NOMINATED NODE   READINESS GATES
sriov-device-plugin-t4tz6   1/1     Running   0          11m   192.168.123.222   openshift-worker-2.example.com   <none>           <none>
[root@openshift-jumpserver-0 ~]#

Comment 9 zenghui.shi 2020-10-19 00:41:12 UTC
@Andreas , Yes, the upstream PR you mentioned is going to fix this rootDevice issue: https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/264
we will need to update both device plugin and operator images in order to test this feature:
openshift sriov device plugin PR: https://github.com/openshift/sriov-network-device-plugin/pull/32
openshift sriov network operator PR: https://github.com/openshift/sriov-network-operator/pull/370

Comment 10 Andreas Karis 2020-10-19 12:53:02 UTC
Awesome, thanks for the confirmation.

Comment 11 zenghui.shi 2020-10-20 13:26:04 UTC

*** This bug has been marked as a duplicate of bug 1877648 ***