Description of problem: When creating a node policy with more than one label selector, the cni / device plugin pods are getting restarted every 30 seconds Version-Release number of selected component (if applicable): 4.4 and sequent How reproducible: Always Steps to Reproduce: 1. Create a policy with a node selector as follows nodeSelector: aaa: "bbb" ccc: "ddd" 2. Observe the cni pod 3. Actual results: The pod keeps restarting Expected results: The pod is restarted only once Additional info:
Verified this bug on 4.7.0-202011202119.p0 cat policy_bug1898580 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-netdevice namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: rootDevices: - '0000:3b:00.0' vendor: '8086' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' sriov: worker1 numVfs: 5 priority: 99 resourceName: intelnetdevice when creating above policy and check the pod: AME READY STATUS RESTARTS AGE network-resources-injector-fxfcq 1/1 Running 0 39m network-resources-injector-l7npf 1/1 Running 0 39m network-resources-injector-l8jtb 1/1 Running 0 39m operator-webhook-rmdk6 1/1 Running 0 39m operator-webhook-scrkt 1/1 Running 0 39m operator-webhook-vstqd 1/1 Running 0 39m sriov-cni-ft2fc 2/2 Running 0 3m31s sriov-device-plugin-kd64q 1/1 Running 0 4m7s sriov-network-config-daemon-6hzz7 1/1 Running 0 39m sriov-network-config-daemon-tlnfh 1/1 Running 0 39m sriov-network-operator-5f768b8c4f-cqgxl 1/1 Running 0 39m
reopen this bug, found the sriov cni and dp pod still always recreated in 3-5 mins oc get pod NAME READY STATUS RESTARTS AGE network-resources-injector-fxfcq 1/1 Running 0 16h network-resources-injector-l7npf 1/1 Running 0 16h network-resources-injector-l8jtb 1/1 Running 0 16h operator-webhook-9f7n2 1/1 Running 0 10h operator-webhook-lrpkg 1/1 Running 0 10h operator-webhook-tfmpr 1/1 Running 0 10h sriov-cni-fpwpt 2/2 Terminating 0 4m29s sriov-device-plugin-9z2bw 0/1 Terminating 0 4m59s sriov-network-config-daemon-mnlgn 1/1 Running 0 10h sriov-network-config-daemon-xnn82 1/1 Running 0 10h sriov-network-operator-6d54c974c8-qsj48 1/1 Running 0 10h check the event as below. when `Started container sriov-device-plugin`, after 5 mins `Stopping container sriov-device-plugin` 66m Normal Created pod/sriov-device-plugin-ttb67 Created container sriov-device-plugin 66m Normal Started pod/sriov-device-plugin-ttb67 Started container sriov-device-plugin 61m Normal Killing pod/sriov-device-plugin-ttb67 Stopping container sriov-device-plugin 96m Normal Scheduled pod/sriov-device-plugin-vrmt2 Successfully assigned openshift-sriov-network-operator/sriov-device-plugin-vrmt2 to dell-per740-14.rhts.eng.pek2.redhat.com 92m Warning DNSConfigForming pod/sriov-device-plugin-vrmt2 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10 96m Normal Pulled pod/sriov-device-plugin-vrmt2 Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:7d0fb4f1cbaa889138ad90478a8eb7ef55fb53a0a8ec08cb7c0b0eadabc19cf1" already present on machine 96m Normal Created pod/sriov-device-plugin-vrmt2 Created container sriov-device-plugin 96m Normal Started pod/sriov-device-plugin-vrmt2 Started container sriov-device-plugin 91m Normal Killing pod/sriov-device-plugin-vrmt2 Stopping container sriov-device-plugin 79m Normal Scheduled pod/sriov-device-plugin-wb66k Successfully assigned openshift-sriov-network-operator/sriov-device-plugin-wb66k to dell-per740-14.rhts.eng.pek2.redhat.com 77m Warning DNSConfigForming pod/sriov-device-plugin-wb66k Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10 79m Normal Pulled pod/sriov-device-plugin-wb66k Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:7d0fb4f1cbaa889138ad90478a8eb7ef55fb53a0a8ec08cb7c0b0eadabc19cf1" already present on machine 79m Normal Created pod/sriov-device-plugin-wb66k Created container sriov-device-plugin 79m Normal Started pod/sriov-device-plugin-wb66k Started container sriov-device-plugin 76m Normal Killing pod/sriov-device-plugin-wb66k Stopping container sriov-device-plugin 76m Normal Scheduled pod/sriov-device-plugin-znzfm Successfully assigned openshift-sriov-network-operator/sriov-device-plugin-znzfm to dell-per740-14.rhts.eng.pek2.redhat.com 71m Warning DNSConfigForming pod/sriov-device-plugin-znzfm Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10 76m Normal Pulled pod/sriov-device-plugin-znzfm Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:7d0fb4f1cbaa889138ad90478a8eb7ef55fb53a0a8ec08cb7c0b0eadabc19cf1" already present on machine 76m Normal Created pod/sriov-device-plugin-znzfm Created container sriov-device-plugin 76m Normal Started pod/sriov-device-plugin-znzfm Started container sriov-device-plugin 71m Normal Killing pod/sriov-device-plugin-znzfm Stopping container sriov-device-plugin 86s Normal SuccessfulDelete daemonset/sriov-device-plugin (combined from similar events): Deleted pod: sriov-device-plugin-9z2bw 25s Warning DNSConfigForming pod/sriov-network-config-daemon-mnlgn Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10 4m11s Warning DNSConfigForming pod/sriov-network-config-daemon-xnn82 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10
@zzhao I opened https://bugzilla.redhat.com/show_bug.cgi?id=1901909 as the scenario of what you are seeing is different. The 5 mins reboot happens independently of what configuration you are applying.
Thanks Federico bug 1901909 is fixed. So move this bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633