Bug 1888828
| Summary: | rootDevices filter not working for sriov-network-device-plugin - ListAndWatch assigns too many devices to specific resource | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andreas Karis <akaris> |
| Component: | Networking | Assignee: | zenghui.shi <zshi> |
| Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | mmethot, zshi |
| Version: | 4.4 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-20 13:26:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Andreas Karis
2020-10-15 20:35:27 UTC
The workaround for me is to use pfNames:
~~~
[root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-enp5s0f0-akaris
namespace: openshift-sriov-network-operator
spec:
resourceName: enp5s0f0NetdevAkaris
priority: 99
numVfs: 2
nicSelector:
pfNames: ["enp5s0f0"]
# deviceID: "154d"
# rootDevices:
# - '0000:05:00.0'
vendor: "8086"
deviceType: "netdevice"
isRdma: false
nodeSelector:
kubernetes.io/hostname: openshift-worker-0.example.com
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-enp5s0f1-akaris2
namespace: openshift-sriov-network-operator
spec:
resourceName: enp5s0f1NetdevAkaris2
priority: 99
numVfs: 1
nicSelector:
pfNames: ["enp5s0f1"]
# deviceID: '154d'
# rootDevices:
# - '0000:05:00.1'
vendor: '8086'
deviceType: "netdevice"
isRdma: false
nodeSelector:
kubernetes.io/hostname: openshift-worker-0.example.com
~~~
~~~
Oct 15 20:24:23 openshift-worker-0.example.com hyperkube[2159]: I1015 20:24:23.528514 2159 setters.go:323] Update capacity for openshift.io/enp5s0f0NetdevAkaris to 2
(...)
Oct 15 20:25:15 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:15.604639 2159 endpoint.go:111] State pushed for device plugin openshift.io/enp5s0f1NetdevAkaris2
Oct 15 20:25:15 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:15.604769 2159 endpoint.go:111] State pushed for device plugin openshift.io/enp5s0f0NetdevAkaris
Oct 15 20:25:23 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:23.591637 2159 setters.go:323] Update capacity for openshift.io/enp5s0f1NetdevAkaris2 to 1
Oct 15 20:25:23 openshift-worker-0.example.com hyperkube[2159]: I1015 20:25:23.591637 2159 setters.go:323] Update capacity for openshift.io/enp5s0f1NetdevAkaris2 to 1
~~~
~~~
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-0.example.com -o yaml | grep Akaris -B15 | grep enp5s0f[01]NetdevAkaris
openshift.io/enp5s0f0NetdevAkaris: "2"
openshift.io/enp5s0f0NetdevAkaris2: "0"
openshift.io/enp5s0f1NetdevAkaris2: "1"
openshift.io/enp5s0f0NetdevAkaris: "2"
openshift.io/enp5s0f0NetdevAkaris2: "0"
openshift.io/enp5s0f1NetdevAkaris2: "1"
~~~
Zenghui Shi, seems that you implemented https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/264 ; can you confirm that this is the same issue, and that your upstream commit fixes this issue? Thanks :-) The patch has been pulled today (5 hours ago) for 4.7: - https://github.com/openshift/sriov-network-device-plugin/commit/fb8a4320c8780d287b9b491190cd5c2d1626a5f6 - https://github.com/openshift/sriov-network-device-plugin/pull/32 The "latest" tag does not have the commits, however 4.7 does: ~~~ [mmethot@localhost sriov-network-operator]$ skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin | jq --raw-output '.Created' 2020-09-18T01:14:23.72057044Z [mmethot@localhost sriov-network-operator]$ skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin:4.7 | jq --raw-output '.Created' 2020-10-16T09:58:09.12728526Z ~~~ I'll see if I can test it by just changing the SRIOV_DEVICE_PLUGIN_IMAGE to the 4.7 version Inspecting the pod and its container image:
~~~
[root@openshift-jumpserver-0 ~]# oc describe pod -n openshift-sriov-network-operator sriov-device-plugin-csb4s | grep -i image
Image: registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:b6dda4c1617da04d9ae9a8526e61373e3a7de67e8827dff171a6ca36f2a624b5
Image ID: registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:b6dda4c1617da04d9ae9a8526e61373e3a7de67e8827dff171a6ca36f2a624b5
Normal Pulled 57m kubelet, openshift-worker-0.example.com Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:b6dda4c1617da04d9ae9a8526e61373e3a7de67e8827dff171a6ca36f2a624b5" already present on machine
~~~
Inspecting the tags available for that image:
~~~
[root@openshift-jumpserver-0 ~]# skopeo list-tags docker://registry.redhat.io/openshift4/ose-sriov-network-device-plugin
{
"Repository": "registry.redhat.io/openshift4/ose-sriov-network-device-plugin",
"Tags": [
"v4.3.26-202006160135",
"v4.3.14",
"v4.3.13",
"v4.3.12",
"v4.4.0-202007232108.p0",
"v4.3.10",
"v4.4.0-202007120152.p0",
"v4.4.0",
"v4.3.19",
"v4.1.24",
"v4.1.25",
"v4.3.5",
"v4.1.34-202002031224",
"v4.3.3",
"v4.1.21",
"v4.3.1",
"v4.3.0",
"v4.1.0-201905191700",
"v4.1.28",
"v4.1.29",
"v4.2.32-202005050921",
"v4.2.32",
"v4.3.28-202006290519.p0",
"v4.3.7-202003161611",
"v4.1.18-201909201915",
"v4.3.19-202005041055",
"v4.3.25-202006081335",
"v4.4.0-202007171809.p0",
"v4.3.3-202002171705",
"v4.2.26",
"v4.2.10-201912022352",
"v4.1.16-201909100604",
"v4.3.13-202004131016",
"v4.2.9-201911261133",
"v4.2.18-202002031246",
"v4.1.10",
"v4.1.13",
"v4.1.15",
"v4.1.14",
"v4.1.17",
"v4.1.16",
"v4.1.18",
"v4.4.0-202005121717",
"v4.3.7",
"v4.2.15-202001171551",
"v4.1.30-202001061940",
"v4.1.10-201908061216",
"v4.5",
"v4.1.26",
"v4.2.24",
"v4.2.27",
"v4.1.27",
"v4.2.21",
"v4.2.20",
"v4.2.23",
"v4.2.22",
"v4.2.21-202002240343",
"v4.2.29",
"v4.2.28",
"v4.4.0-202005180840",
"v4.3.2",
"v4.3.31",
"v4.5.0",
"v4.3.33",
"v4.3.35",
"v4.1.22",
"v4.3.37",
"v4.2.11-201912100122",
"v4.1.23",
"v4.5.0-202008210149.p0",
"v4.2.5-201911121709",
"v4.3.28",
"v4.1.4",
"v4.1.7",
"v4.1.6",
"v4.1.1",
"v4.1.0",
"v4.1.3",
"v4.1.9-201907311355",
"v4.1.9",
"v4.1.8",
"v4.5.0-202007131801.p0",
"v4.1.17-201909171057",
"v4.1",
"v4.2",
"v4.3",
"v4.4",
"v4.3.9",
"v4.2.34",
"v4.2.34-202005252115",
"v4.1.41-202004130646",
"v4.1.7-201907171753",
"v4.1.6-201907101224",
"v4.4.0-202006160135",
"v4.1.21-201910230924",
"v4.3.29",
"v4.3.26",
"v4.3.27",
"v4.3.25",
"v4.3.22",
"v4.3.23",
"v4.3.20",
"v4.2.8-201911190952",
"v4.3.2-202002112006",
"v4.2.22-202003020552",
"v4.3.12-202004091734",
"v4.1.26-201911260202",
"v4.3.10-202003311428",
"v4.2.23-202003090920",
"v4.4.0-202007060343.p0",
"v4.1.13-201908210601",
"v4.2.24-202003161048",
"v4.2.20-202002171604",
"v4.1.24-201911120311",
"v4.1.3-201906181537",
"v4.5.0-202007172106.p0",
"v4.2.14-202001061701",
"v4.2.27-202003301126",
"v4.4.0-202006211643.p0",
"v4.3.23-202005250821",
"v4.1.29-201912230303",
"v4.3.2-202002070552",
"v4.2.1-201910221723",
"v4.3.35-202008311640.p0",
"v4.3.1-202002032140",
"v4.3.29-202007061006.p0",
"v4.1.37-202003021622",
"v4.5.0-202007240519.p0",
"v4.4.0-202006080610",
"v4.1.4-201906271212",
"v4.1.14-201908291507",
"v4.1.22-201910291109",
"v4.3.5-202003020549",
"v4.5.0-202008100413.p0",
"v4.5.0-202009161248.p0",
"v4.1.23-201911050122",
"v4.3.37-202009151447.p0",
"v4.4.0-202004261927",
"v4.5.0-202009041228.p0",
"v4.3.22-202005201238",
"v4.3.31-202007272153.p0",
"v4.2.36-202006230600.p0",
"v4.4.0-202009041255.p0",
"v4.2.0-201910101614",
"v4.2.0",
"v4.3.0-202001211731",
"v4.2.26-202003230335",
"v4.3.33-202008111029.p0",
"v4.2.29-202004140532",
"v4.2.13-201912230557",
"v4.1.41",
"v4.4.0-202009161309.p0",
"v4.3.14-202004200457",
"v4.1.20",
"v4.2.19-202002101212",
"v4.2.1",
"v4.2.4",
"v4.2.5",
"v4.2.8",
"v4.1.31-202001140447",
"v4.3.20-202005121847",
"v4.2.9",
"v4.1.25-201911190028",
"v4.3.9-202003230345",
"v4.3.27-202006211650.p0",
"v4.4.0-202008210157.p0",
"v4.1.28-201912110241",
"v4.4.0-202008100806.p0",
"v4.2.18",
"v4.2.19",
"v4.5.0-202007281519.p0",
"v4.2.10",
"v4.2.11",
"v4.2.13",
"v4.2.14",
"v4.2.15",
"v4.1.37",
"v4.1.27-201912030019",
"v4.1.34",
"v4.5.0-202007012112.p0",
"v4.1.31",
"v4.1.30",
"v4.2.4-201911050122",
"v4.1.15-201909041605",
"v4.2.36",
"v4.4.0-202005252114",
"v4.1.1-201906040019",
"v4.2.28-202004061218",
"v4.1.8-201907241243",
"v4.4.0-202007291438.p0",
"v4.4.0-202006290400.p0",
"latest",
"v4.1.20-201910102034"
]
}
~~~
So, we use this repository here:
https://github.com/openshift/ose-sriov-network-device-plugin (archived) -> https://github.com/openshift/sriov-network-device-plugin
That can also be found by inspecting the image itself:
~~~
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://registry.redhat.io/openshift4/ose-sriov-network-device-plugin
{
"Name": "registry.redhat.io/openshift4/ose-sriov-network-device-plugin",
"Digest": "sha256:a7770ca0e86fa4fe95733eb8e4711e829f724e3b4ffbf8e0ccafdcac275c0372",
"RepoTags": [
"v4.3.26-202006160135",
"v4.3.14",
"v4.3.13",
"v4.3.12",
"v4.4.0-202007232108.p0",
"v4.3.10",
"v4.4.0-202007120152.p0",
"v4.4.0",
"v4.3.19",
"v4.1.24",
"v4.1.25",
"v4.3.5",
"v4.1.34-202002031224",
"v4.3.3",
"v4.1.21",
"v4.3.1",
"v4.3.0",
"v4.1.0-201905191700",
"v4.1.28",
"v4.1.29",
"v4.2.32-202005050921",
"v4.2.32",
"v4.3.28-202006290519.p0",
"v4.3.7-202003161611",
"v4.1.18-201909201915",
"v4.3.19-202005041055",
"v4.3.25-202006081335",
"v4.4.0-202007171809.p0",
"v4.3.3-202002171705",
"v4.2.26",
"v4.2.10-201912022352",
"v4.1.16-201909100604",
"v4.3.13-202004131016",
"v4.2.9-201911261133",
"v4.2.18-202002031246",
"v4.1.10",
"v4.1.13",
"v4.1.15",
"v4.1.14",
"v4.1.17",
"v4.1.16",
"v4.1.18",
"v4.4.0-202005121717",
"v4.3.7",
"v4.2.15-202001171551",
"v4.1.30-202001061940",
"v4.1.10-201908061216",
"v4.5",
"v4.1.26",
"v4.2.24",
"v4.2.27",
"v4.1.27",
"v4.2.21",
"v4.2.20",
"v4.2.23",
"v4.2.22",
"v4.2.21-202002240343",
"v4.2.29",
"v4.2.28",
"v4.4.0-202005180840",
"v4.3.2",
"v4.3.31",
"v4.5.0",
"v4.3.33",
"v4.3.35",
"v4.1.22",
"v4.3.37",
"v4.2.11-201912100122",
"v4.1.23",
"v4.5.0-202008210149.p0",
"v4.2.5-201911121709",
"v4.3.28",
"v4.1.4",
"v4.1.7",
"v4.1.6",
"v4.1.1",
"v4.1.0",
"v4.1.3",
"v4.1.9-201907311355",
"v4.1.9",
"v4.1.8",
"v4.5.0-202007131801.p0",
"v4.1.17-201909171057",
"v4.1",
"v4.2",
"v4.3",
"v4.4",
"v4.3.9",
"v4.2.34",
"v4.2.34-202005252115",
"v4.1.41-202004130646",
"v4.1.7-201907171753",
"v4.1.6-201907101224",
"v4.4.0-202006160135",
"v4.1.21-201910230924",
"v4.3.29",
"v4.3.26",
"v4.3.27",
"v4.3.25",
"v4.3.22",
"v4.3.23",
"v4.3.20",
"v4.2.8-201911190952",
"v4.3.2-202002112006",
"v4.2.22-202003020552",
"v4.3.12-202004091734",
"v4.1.26-201911260202",
"v4.3.10-202003311428",
"v4.2.23-202003090920",
"v4.4.0-202007060343.p0",
"v4.1.13-201908210601",
"v4.2.24-202003161048",
"v4.2.20-202002171604",
"v4.1.24-201911120311",
"v4.1.3-201906181537",
"v4.5.0-202007172106.p0",
"v4.2.14-202001061701",
"v4.2.27-202003301126",
"v4.4.0-202006211643.p0",
"v4.3.23-202005250821",
"v4.1.29-201912230303",
"v4.3.2-202002070552",
"v4.2.1-201910221723",
"v4.3.35-202008311640.p0",
"v4.3.1-202002032140",
"v4.3.29-202007061006.p0",
"v4.1.37-202003021622",
"v4.5.0-202007240519.p0",
"v4.4.0-202006080610",
"v4.1.4-201906271212",
"v4.1.14-201908291507",
"v4.1.22-201910291109",
"v4.3.5-202003020549",
"v4.5.0-202008100413.p0",
"v4.5.0-202009161248.p0",
"v4.1.23-201911050122",
"v4.3.37-202009151447.p0",
"v4.4.0-202004261927",
"v4.5.0-202009041228.p0",
"v4.3.22-202005201238",
"v4.3.31-202007272153.p0",
"v4.2.36-202006230600.p0",
"v4.4.0-202009041255.p0",
"v4.2.0-201910101614",
"v4.2.0",
"v4.3.0-202001211731",
"v4.2.26-202003230335",
"v4.3.33-202008111029.p0",
"v4.2.29-202004140532",
"v4.2.13-201912230557",
"v4.1.41",
"v4.4.0-202009161309.p0",
"v4.3.14-202004200457",
"v4.1.20",
"v4.2.19-202002101212",
"v4.2.1",
"v4.2.4",
"v4.2.5",
"v4.2.8",
"v4.1.31-202001140447",
"v4.3.20-202005121847",
"v4.2.9",
"v4.1.25-201911190028",
"v4.3.9-202003230345",
"v4.3.27-202006211650.p0",
"v4.4.0-202008210157.p0",
"v4.1.28-201912110241",
"v4.4.0-202008100806.p0",
"v4.2.18",
"v4.2.19",
"v4.5.0-202007281519.p0",
"v4.2.10",
"v4.2.11",
"v4.2.13",
"v4.2.14",
"v4.2.15",
"v4.1.37",
"v4.1.27-201912030019",
"v4.1.34",
"v4.5.0-202007012112.p0",
"v4.1.31",
"v4.1.30",
"v4.2.4-201911050122",
"v4.1.15-201909041605",
"v4.2.36",
"v4.4.0-202005252114",
"v4.1.1-201906040019",
"v4.2.28-202004061218",
"v4.1.8-201907241243",
"v4.4.0-202007291438.p0",
"v4.4.0-202006290400.p0",
"latest",
"v4.1.20-201910102034"
],
"Created": "2020-09-16T16:22:14.454607923Z",
"DockerVersion": "1.13.1",
"Labels": {
"License": "GPLv2+",
"architecture": "x86_64",
"build-date": "2020-09-16T16:20:45.754414",
"com.redhat.build-host": "cpt-1008.osbs.prod.upshift.rdu2.redhat.com",
"com.redhat.component": "sriov-network-device-plugin-container",
"com.redhat.license_terms": "https://www.redhat.com/agreements",
"description": "This is the base image from which all OpenShift Container Platform images inherit.",
"distribution-scope": "public",
"io.k8s.description": "This is the base image from which all OpenShift Container Platform images inherit.",
"io.k8s.display-name": "SRIOV Network Device Plugin",
"io.openshift.build.commit.id": "2a152e3b4e02b2968e8f51acc9af454269983105",
"io.openshift.build.commit.url": "https://github.com/openshift/sriov-network-device-plugin/commit/2a152e3b4e02b2968e8f51acc9af454269983105",
"io.openshift.build.source-location": "https://github.com/openshift/sriov-network-device-plugin",
"io.openshift.maintainer.component": "Networking",
"io.openshift.maintainer.product": "OpenShift Container Platform",
"io.openshift.maintainer.subcomponent": "SR-IOV",
"io.openshift.tags": "openshift,base",
"name": "openshift/ose-sriov-network-device-plugin",
"release": "202009161248.p0",
"summary": "Provides the latest release of the Red Hat Universal Base Image 7.",
"url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-sriov-network-device-plugin/images/v4.5.0-202009161248.p0",
"vcs-ref": "6e2ac1ce55d4c68788df79f1da4cfd55bcf279be",
"vcs-type": "git",
"vendor": "Red Hat, Inc.",
"version": "v4.5.0"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:c9fa7d57b9028d4bd02b51cef3c3039fa7b23a8b2d9d26a6ce66b3428f6e2457",
"sha256:74cbb6607642df5f9f70e8588e3c56d6de795d1a9af22866ea4cc82f2dad4f14",
"sha256:342d2c032054c3651de9a26b47df275eb472a0e268d7b533ff14f4aab2c029e1",
"sha256:8da07861177a6e84747ec6309336839e9be6b526931dcf967b5ed815debf00f5",
"sha256:b5a21cdc47542e59df1dbd56326f34377ca821aa120ba3ee7e9aa44ed1ecc9fd"
],
"Env": [
"__doozer=merge",
"BUILD_RELEASE=202009161248.p0",
"BUILD_VERSION=v4.5.0",
"OS_GIT_MAJOR=4",
"OS_GIT_MINOR=5",
"OS_GIT_PATCH=0",
"OS_GIT_TREE_STATE=clean",
"OS_GIT_VERSION=4.5.0-202009161248.p0-2a152e3",
"SOURCE_GIT_TREE_STATE=clean",
"OS_GIT_COMMIT=2a152e3",
"SOURCE_DATE_EPOCH=1586862169",
"SOURCE_GIT_COMMIT=2a152e3b4e02b2968e8f51acc9af454269983105",
"SOURCE_GIT_TAG=2a152e3b",
"SOURCE_GIT_URL=https://github.com/openshift/sriov-network-device-plugin",
"INSTALL_PKGS=hwdata",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"container=oci"
]
}
~~~
I don't know how Marc made the connection to quay.io, but indeed this image and tag here have the fix:
~~~
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin:latest
{
"Name": "quay.io/openshift/origin-sriov-network-device-plugin",
"Digest": "sha256:d4f78377a57b30e94c27cd5fe1f4d0a404fa2980c3224606ac58bdd2e29e60f3",
"RepoTags": [
"v4.0",
"v4.0.0",
"4.1",
"4.1.0",
"4.2",
"4.2.0",
"4.3",
"4.3.0",
"4.4",
"4.4.0",
"4.5",
"4.5.0",
"4.6",
"4.6.0",
"latest",
"4.7",
"4.7.0"
],
"Created": "2020-09-18T01:14:23.72057044Z",
"DockerVersion": "1.13.1",
"Labels": {
"architecture": "x86_64",
"build-date": "2020-09-05T01:13:15.933978",
"com.redhat.build-host": "cpt-1003.osbs.prod.upshift.rdu2.redhat.com",
"com.redhat.component": "openshift-enterprise-base-container",
"com.redhat.license_terms": "https://www.redhat.com/agreements",
"description": "The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
"distribution-scope": "public",
"io.k8s.description": "This is the base image from which all OpenShift images inherit.",
"io.k8s.display-name": "SRIOV Network Device Plugin",
"io.openshift.build.commit.author": "",
"io.openshift.build.commit.date": "",
"io.openshift.build.commit.id": "8cc3e960f7f1831eaec2baa9b4fb52f3232f6e39",
"io.openshift.build.commit.message": "",
"io.openshift.build.commit.ref": "master",
"io.openshift.build.name": "",
"io.openshift.build.namespace": "",
"io.openshift.build.source-context-dir": "",
"io.openshift.build.source-location": "https://github.com/openshift/sriov-network-device-plugin",
"io.openshift.expose-services": "",
"io.openshift.tags": "base rhel8",
"maintainer": "Red Hat, Inc.",
"name": "openshift/ose-base",
"release": "202009050041.5133",
"summary": "Provides the latest release of Red Hat Universal Base Image 8.",
"url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.0-202009050041.5133",
"vcs-ref": "8cc3e960f7f1831eaec2baa9b4fb52f3232f6e39",
"vcs-type": "git",
"vcs-url": "https://github.com/openshift/sriov-network-device-plugin",
"vendor": "Red Hat, Inc.",
"version": "v4.0"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:77c58f19bd6e67185938abb6bbb6ec229e07a5e607453904294d982de141d2f0",
"sha256:47db82df7f3f4393c1f19c362a2db2c47ca049b6fb20bef041dfc9bdb12a4504",
"sha256:9e518d0279d9c75999daa4b867d4924c26b32617223a7ff068027268dcdbf4e4",
"sha256:c1c8ae64753f9b716109aba76c59de8d11a6f232f27aeac2ecfa4cb926e3ff70",
"sha256:dc1710d2036e1a862c75c63a2cdb0bfcf8e1c691bbbae89fcd9f5b87c694991c",
"sha256:42d548b84ccaf0226ae09a2012647c9eb22c29c65a538fba7d287b1e3da34535"
],
"Env": [
"foo=bar",
"INSTALL_PKGS=hwdata",
"OPENSHIFT_BUILD_NAME=sriov-network-device-plugin",
"OPENSHIFT_BUILD_NAMESPACE=ci-op-ikhv9i70",
"OPENSHIFT_CI=true",
"OPENSHIFT_BUILD_SOURCE=https://github.com/openshift/release.git",
"OPENSHIFT_BUILD_REFERENCE=master",
"OPENSHIFT_BUILD_COMMIT=6bd8b00435ece6787878009f4f16cba809f9f930",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"container=oci"
]
}
[root@openshift-jumpserver-0 ~]#
[root@openshift-jumpserver-0 ~]#
[root@openshift-jumpserver-0 ~]#
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin:4.7
{
"Name": "quay.io/openshift/origin-sriov-network-device-plugin",
"Digest": "sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0",
"RepoTags": [
"v4.0",
"v4.0.0",
"4.2",
"4.2.0",
"4.3",
"4.3.0",
"4.4",
"4.4.0",
"4.5",
"4.5.0",
"4.6",
"4.6.0",
"latest",
"4.7",
"4.7.0",
"4.1",
"4.1.0"
],
"Created": "2020-10-16T09:58:09.12728526Z",
"DockerVersion": "1.13.1",
"Labels": {
"architecture": "x86_64",
"build-date": "2020-09-05T01:13:15.933978",
"com.redhat.build-host": "cpt-1003.osbs.prod.upshift.rdu2.redhat.com",
"com.redhat.component": "openshift-enterprise-base-container",
"com.redhat.license_terms": "https://www.redhat.com/agreements",
"description": "The Universal Base Image is designed and engineered to be the base layer for all of your containerized applications, middleware and utilities. This base image is freely redistributable, but Red Hat only supports Red Hat technologies through subscriptions for Red Hat products. This image is maintained by Red Hat and updated regularly.",
"distribution-scope": "public",
"io.k8s.description": "This is the base image from which all OpenShift images inherit.",
"io.k8s.display-name": "SRIOV Network Device Plugin",
"io.openshift.build.commit.author": "",
"io.openshift.build.commit.date": "",
"io.openshift.build.commit.id": "fb8a4320c8780d287b9b491190cd5c2d1626a5f6",
"io.openshift.build.commit.message": "",
"io.openshift.build.commit.ref": "master",
"io.openshift.build.name": "",
"io.openshift.build.namespace": "",
"io.openshift.build.source-context-dir": "",
"io.openshift.build.source-location": "https://github.com/openshift/sriov-network-device-plugin",
"io.openshift.expose-services": "",
"io.openshift.tags": "base rhel8",
"maintainer": "Red Hat, Inc.",
"name": "openshift/ose-base",
"release": "202009050041.5133",
"summary": "Provides the latest release of Red Hat Universal Base Image 8.",
"url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-base/images/v4.0-202009050041.5133",
"vcs-ref": "fb8a4320c8780d287b9b491190cd5c2d1626a5f6",
"vcs-type": "git",
"vcs-url": "https://github.com/openshift/sriov-network-device-plugin",
"vendor": "Red Hat, Inc.",
"version": "v4.0"
},
"Architecture": "amd64",
"Os": "linux",
"Layers": [
"sha256:77c58f19bd6e67185938abb6bbb6ec229e07a5e607453904294d982de141d2f0",
"sha256:47db82df7f3f4393c1f19c362a2db2c47ca049b6fb20bef041dfc9bdb12a4504",
"sha256:9e518d0279d9c75999daa4b867d4924c26b32617223a7ff068027268dcdbf4e4",
"sha256:b4070df4e3ccb68562aa89b31f2d5b6b0a035c2f227770454d66510569db965c",
"sha256:76ae5f5edcd237b0beafdde94430aefa4964cd2763f901c8928e2aaa70139fb9",
"sha256:cff1622ad4836a42339c32ee1978904c647659aaf039a5f84a61e4bcd2fa7eb7"
],
"Env": [
"foo=bar",
"INSTALL_PKGS=hwdata",
"OPENSHIFT_BUILD_NAME=sriov-network-device-plugin",
"OPENSHIFT_BUILD_NAMESPACE=ci-op-fgdb8chi",
"GODEBUG=x509ignoreCN=0",
"OPENSHIFT_CI=true",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"container=oci"
]
}
~~~
So, that image pulls from commit:
https://github.com/openshift/sriov-network-device-plugin/commit/fb8a4320c8780d287b9b491190cd5c2d1626a5f6
For verfification, we can patch the CSV:
~~~
[root@openshift-jumpserver-0 ~]# oc get csv -n openshift-sriov-network-operator
NAME DISPLAY VERSION REPLACES PHASE
sriov-network-operator.4.4.0-202009281328.p0 SR-IOV Network Operator 4.4.0-202009281328.p0 Succeeded
~~~
~~~
oc edit csv -n openshift-sriov-network-operator sriov-network-operator.4.4.0-202009281328.p0
~~~
And search for SRIOV_DEVICE_PLUGIN_IMAGE. Change the value that belongs to that index to: "quay.io/openshift/origin-sriov-network-device-plugin:4.7"
The following verification command should yield:
~~~
[root@openshift-jumpserver-0 ~]# oc get csv -n openshift-sriov-network-operator sriov-network-operator.4.4.0-202009281328.p0 -o json | jq '.spec.install.spec.deployments[] | select(.name == "sriov-network-operator") | .spec.template.spec.containers[] | select(.name == "sriov-network-operator").env[] | select(.name =="SRIOV_DEVICE_PLUGIN_IMAGE")'
{
"name": "SRIOV_DEVICE_PLUGIN_IMAGE",
"value": "quay.io/openshift/origin-sriov-network-device-plugin:4.7"
}
~~~
Now, delete the sriov-network-operator deployment:
~~~
oc delete deployment -n openshift-sriov-network-operator sriov-network-operator
~~~
Watch the CSV's log messages to make sure that it picks up the change:
~~~
[root@openshift-jumpserver-0 ~]# oc describe csv -n openshift-sriov-network-operator | tail -n 15
Type Reason Age From Message
---- ------ ---- ---- -------
Normal RequirementsUnknown 167m operator-lifecycle-manager requirements not yet checked
Normal RequirementsNotMet 167m (x2 over 167m) operator-lifecycle-manager one or more requirements couldn't be found
Normal InstallWaiting 167m operator-lifecycle-manager installing: waiting for deployment sriov-network-operator to become ready: Waiting for deployment spec update to be observed...
Warning ComponentUnhealthy 31m operator-lifecycle-manager installing: waiting for deployment sriov-network-operator to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
Warning ComponentUnhealthy 21m (x3 over 24m) operator-lifecycle-manager installing: missing deployment with name=sriov-network-operator
Normal NeedsReinstall 21m (x2 over 24m) operator-lifecycle-manager installing: missing deployment with name=sriov-network-operator
Normal AllRequirementsMet 21m (x5 over 167m) operator-lifecycle-manager all requirements found, attempting install
Normal InstallSucceeded 21m (x8 over 167m) operator-lifecycle-manager waiting for install components to report healthy
Normal InstallWaiting 21m (x5 over 167m) operator-lifecycle-manager installing: waiting for deployment sriov-network-operator to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
Normal InstallSucceeded 21m (x6 over 167m) operator-lifecycle-manager install strategy completed with no errors
Normal NeedsReinstall 20m (x3 over 31m) operator-lifecycle-manager installing: waiting for deployment sriov-network-operator to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
Warning ComponentUnhealthy 20m (x2 over 31m) operator-lifecycle-manager installing: waiting for deployment sriov-network-operator to become ready: Waiting for deployment spec update to be observed...
Normal ComponentUnhealthy 1m operator-lifecycle-manager installing: deployment changed old hash=7b88b64698, new hash=7759748fbb
~~~
The new sriov-network-operator should reflect the change in image name:
~~~
[root@openshift-jumpserver-0 ~]# oc get deployment -n openshift-sriov-network-operator sriov-network-operator -o yaml | grep -i SRIOV_DEVICE_PLUGIN_IMAGE -A1
- name: SRIOV_DEVICE_PLUGIN_IMAGE
value: quay.io/openshift/origin-sriov-network-device-plugin:4.7
~~~
This should cascade down into the sriov-device-plugin daemonset:
~~~
[root@openshift-jumpserver-0 ~]# oc get daemonset -n openshift-sriov-network-operator sriov-device-plugin -o yaml | grep -i image
image: quay.io/openshift/origin-sriov-network-device-plugin:4.7
imagePullPolicy: IfNotPresent
~~~
And new pods should spawn:
~~~
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-sriov-network-operator -o wide --show-labels -l app=sriov-device-plugin
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
sriov-device-plugin-7jckr 1/1 Running 0 1m 192.168.123.220 openshift-worker-0.example.com <none> <none> app=sriov-device-plugin,component=network,controller-revision-hash=6ccf8b4d67,openshift.io/component=network,pod-template-generation=17,type=infra
~~~
~~~
[root@openshift-jumpserver-0 ~]# oc describe pods -n openshift-sriov-network-operator -l app=sriov-device-plugin | grep -i image
Image: quay.io/openshift/origin-sriov-network-device-plugin:4.7
Image ID: quay.io/openshift/origin-sriov-network-device-plugin@sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0
Normal Pulled 20m kubelet, openshift-worker-0.example.com Container image "quay.io/openshift/origin-sriov-network-device-plugin:4.7" already present on machine
~~~
~~~
[root@openshift-jumpserver-0 ~]# skopeo inspect docker://quay.io/openshift/origin-sriov-network-device-plugin@sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0 | grep -i commit
"io.openshift.build.commit.author": "",
"io.openshift.build.commit.date": "",
"io.openshift.build.commit.id": "fb8a4320c8780d287b9b491190cd5c2d1626a5f6",
"io.openshift.build.commit.message": "",
"io.openshift.build.commit.ref": "master",
~~~
And just as a reminder, that's the commit in question: https://github.com/openshift/sriov-network-device-plugin/commit/fb8a4320c8780d287b9b491190cd5c2d1626a5f6
I'm testing with a slightly changed definition:
~~~
[root@openshift-jumpserver-0 ~]# cat networkpolicy-netdevice.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-enp5s0f0-netdevice-1888828
namespace: openshift-sriov-network-operator
spec:
resourceName: enp5s0f0Netdev1888828
nodeSelector:
kubernetes.io/hostname: openshift-worker-2.example.com
priority: 10
mtu: 1500
numVfs: 2
nicSelector:
vendor: "8086"
rootDevices: ["0000:05:00.0"]
deviceType: "netdevice"
isRdma: false
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: sriov-enp5s0f1-netdevice-1888828
namespace: openshift-sriov-network-operator
spec:
resourceName: enp5s0f1Netdev1888828
nodeSelector:
kubernetes.io/hostname: openshift-worker-2.example.com
priority: 10
mtu: 1500
numVfs: 1
nicSelector:
vendor: "8086"
rootDevices: ["0000:05:00.1"]
deviceType: "netdevice"
isRdma: false
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-sriov-network-operator
NAME READY STATUS RESTARTS AGE
network-resources-injector-68plg 1/1 Running 0 3h32m
network-resources-injector-7vjq8 1/1 Running 0 3h32m
network-resources-injector-g5h5f 1/1 Running 0 3h32m
operator-webhook-bh2dd 1/1 Running 0 3h32m
operator-webhook-gc55c 1/1 Running 0 3h32m
operator-webhook-tzk5s 1/1 Running 0 3h32m
sriov-network-config-daemon-hv8bg 1/1 Running 0 3h32m
sriov-network-config-daemon-k2d2c 1/1 Running 0 3h32m
sriov-network-config-daemon-sdhjh 1/1 Running 0 3h32m
sriov-network-operator-748778b679-hh76v 1/1 Running 0 59m
[root@openshift-jumpserver-0 ~]# oc apply -f networkpolicy-netdevice.yaml
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-enp5s0f0-netdevice-1888828 created
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-enp5s0f1-netdevice-1888828 created
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-2.example.com -o yaml | grep openshift.io
machineconfiguration.openshift.io/currentConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
machineconfiguration.openshift.io/desiredConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
machineconfiguration.openshift.io/reason: ""
machineconfiguration.openshift.io/state: Done
node.openshift.io/os_id: rhcos
openshift.io/enp5s0f0Netdev: "0"
openshift.io/enp5s0f0Vfiopci: "0"
openshift.io/enp5s0f1Netdev: "0"
openshift.io/enp5s0f0Netdev: "0"
openshift.io/enp5s0f0Vfiopci: "0"
openshift.io/enp5s0f1Netdev: "0"
[root@openshift-jumpserver-0 ~]# sleep 300 ; oc get nodes openshift-worker-2.example.com -o yaml | grep openshift.io
[root@openshift-jumpserver-0 ~]# oc get nodes openshift-worker-2.example.com -o yaml | grep openshift.io
machineconfiguration.openshift.io/currentConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
machineconfiguration.openshift.io/desiredConfig: rendered-worker-f1177c2b436dafe5c85464a8c3276b11
machineconfiguration.openshift.io/reason: ""
machineconfiguration.openshift.io/state: Done
node.openshift.io/os_id: rhcos
openshift.io/enp5s0f0Netdev: "0"
openshift.io/enp5s0f0Netdev1888828: "3"
openshift.io/enp5s0f0Vfiopci: "0"
openshift.io/enp5s0f1Netdev: "0"
openshift.io/enp5s0f1Netdev1888828: "3"
openshift.io/enp5s0f0Netdev: "0"
openshift.io/enp5s0f0Netdev1888828: "3"
openshift.io/enp5s0f0Vfiopci: "0"
openshift.io/enp5s0f1Netdev: "0"
openshift.io/enp5s0f1Netdev1888828: "3"
~~~
And spawning pods:
~~~
[root@openshift-jumpserver-0 ~]# cat sriovnetwork-enp5s0f0.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: sriov-net-enp5s0f0-netdev-1888828
namespace: openshift-sriov-network-operator
spec:
networkNamespace: sriov-testing
ipam: '{ "type": "static" }'
vlan: 905
resourceName: enp5s0f0Netdev1888828
trust: "on"
capabilities: '{ "mac": true, "ips": true }'
[root@openshift-jumpserver-0 ~]# oc apply -f sriovnetwork-enp5s0f0.yaml
sriovnetwork.sriovnetwork.openshift.io/sriov-net-enp5s0f0-netdev-1888828 created
[root@openshift-jumpserver-0 ~]# oc get -f !$
oc get -f sriovnetwork-enp5s0f0.yaml
NAME AGE
sriov-net-enp5s0f0-netdev-1888828 13s
[root@openshift-jumpserver-0 ~]# oc apply -f sriovpod0.yaml
pod/sriovpod0 created
[root@openshift-jumpserver-0 ~]# oc apply -f sriovpod1.yaml
pod/sriovpod1 created
[root@openshift-jumpserver-0 ~]# oc apply -f sriovpod2.yaml
pod/sriovpod2 created
[root@openshift-jumpserver-0 ~]# cat sriovpod{0,1,2}.yaml
apiVersion: v1
kind: Pod
metadata:
name: sriovpod0
namespace: sriov-testing
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-netdev-1888828",
"ips": ["192.168.10.10/24", "2001::10/64"]
}
]'
spec:
containers:
- name: sample-container
image: centos:8
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
name: sriovpod1
namespace: sriov-testing
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-netdev-1888828",
"ips": ["192.168.10.11/24", "2001::11/64"]
}
]'
spec:
containers:
- name: sample-container
image: centos:8
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
name: sriovpod2
namespace: sriov-testing
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "sriov-net-enp5s0f0-netdev-1888828",
"ips": ["192.168.10.12/24", "2001::12/64"]
}
]'
spec:
containers:
- name: sample-container
image: centos:8
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
~~~
~~~
[root@openshift-jumpserver-0 ~]# oc get pods -n sriov-testing
NAME READY STATUS RESTARTS AGE
sriovpod0 1/1 Running 0 32s
sriovpod1 1/1 Running 0 28s
sriovpod2 1/1 Running 0 24s
~~~
~~~
[root@openshift-jumpserver-0 ~]# oc exec -it -n sriov-testing sriovpod0 env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEV1888828=0000:05:10.2
[root@openshift-jumpserver-0 ~]# oc exec -it -n sriov-testing sriovpod1 env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEV1888828=0000:05:10.0
[root@openshift-jumpserver-0 ~]# oc exec -it -n sriov-testing sriovpod2 env | grep PCI
PCIDEVICE_OPENSHIFT_IO_ENP5S0F0NETDEV1888828=0000:05:10.1
[root@openshift-jumpserver-0 ~]#
~~~
~~~
8: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:e5:da:30 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 82:10:bc:d8:12:9d brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
vf 1 link/ether e2:0e:13:52:80:34 brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
[root@openshift-worker-2 ~]# ip link ls dev enp5s0f1
9: enp5s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether a0:36:9f:e5:da:32 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 22:f1:58:4a:4c:69 brd ff:ff:ff:ff:ff:ff, vlan 905, spoof checking on, link-state auto, trust on, query_rss off
~~~
So, even with the new image, the issue persists. So, either the aforementioned bug fix was not supposed to fix this, or there's a problem with that bug fix, or my test is flawed.
[root@openshift-jumpserver-0 ~]# oc describe pods -n openshift-sriov-network-operator -l app=sriov-device-plugin | grep -i image
Image: quay.io/openshift/origin-sriov-network-device-plugin:4.7
Image ID: quay.io/openshift/origin-sriov-network-device-plugin@sha256:2782cd28660817e59819fa07a7dd9b0e18f91bff55074a16b4d5d20639fa2aa0
Normal Pulled 12m kubelet, openshift-worker-2.example.com Container image "quay.io/openshift/origin-sriov-network-device-plugin:4.7" already present on machine
[root@openshift-jumpserver-0 ~]# oc get pods -n openshift-sriov-network-operator -l app=sriov-device-plugin -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sriov-device-plugin-t4tz6 1/1 Running 0 11m 192.168.123.222 openshift-worker-2.example.com <none> <none>
[root@openshift-jumpserver-0 ~]#
@Andreas , Yes, the upstream PR you mentioned is going to fix this rootDevice issue: https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin/pull/264 we will need to update both device plugin and operator images in order to test this feature: openshift sriov device plugin PR: https://github.com/openshift/sriov-network-device-plugin/pull/32 openshift sriov network operator PR: https://github.com/openshift/sriov-network-operator/pull/370 Awesome, thanks for the confirmation. *** This bug has been marked as a duplicate of bug 1877648 *** |