Bug 1898580 - When adding more than one node selector to the sriovnetworknodepolicy, the cni and the device plugin pods are constantly rebooted
Summary: When adding more than one node selector to the sriovnetworknodepolicy, the cn...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.7.0
Assignee: Federico Paolinelli
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1901909
Blocks: 1898595 1898597
TreeView+ depends on / blocked
 
Reported: 2020-11-17 14:52 UTC by Federico Paolinelli
Modified: 2024-03-25 17:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1898594 1898595 1898597 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:34:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 411 0 None closed Bug 1898580: fix the sriov daemon selector assignement. 2021-01-21 09:25:03 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:34:53 UTC

Description Federico Paolinelli 2020-11-17 14:52:27 UTC
Description of problem:

When creating a node policy with more than one label selector, the cni / device plugin pods are getting restarted every 30 seconds


Version-Release number of selected component (if applicable):
4.4 and sequent

How reproducible:
Always

Steps to Reproduce:
1. Create a policy with a node selector as follows
nodeSelector:
    aaa: "bbb"
    ccc: "ddd"

2. Observe the cni pod
3.

Actual results:

The pod keeps restarting

Expected results:

The pod is restarted only once
Additional info:

Comment 4 zhaozhanqi 2020-11-23 11:14:59 UTC
Verified this bug on 4.7.0-202011202119.p0

cat policy_bug1898580 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
    sriov: worker1
  numVfs: 5
  priority: 99
  resourceName: intelnetdevice


when creating above policy and check the pod:

AME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-fxfcq          1/1     Running   0          39m
network-resources-injector-l7npf          1/1     Running   0          39m
network-resources-injector-l8jtb          1/1     Running   0          39m
operator-webhook-rmdk6                    1/1     Running   0          39m
operator-webhook-scrkt                    1/1     Running   0          39m
operator-webhook-vstqd                    1/1     Running   0          39m
sriov-cni-ft2fc                           2/2     Running   0          3m31s
sriov-device-plugin-kd64q                 1/1     Running   0          4m7s
sriov-network-config-daemon-6hzz7         1/1     Running   0          39m
sriov-network-config-daemon-tlnfh         1/1     Running   0          39m
sriov-network-operator-5f768b8c4f-cqgxl   1/1     Running   0          39m

Comment 5 zhaozhanqi 2020-11-24 03:17:51 UTC
reopen this bug, found the sriov cni and dp pod still always recreated in 3-5 mins

oc get pod
NAME                                      READY   STATUS        RESTARTS   AGE
network-resources-injector-fxfcq          1/1     Running       0          16h
network-resources-injector-l7npf          1/1     Running       0          16h
network-resources-injector-l8jtb          1/1     Running       0          16h
operator-webhook-9f7n2                    1/1     Running       0          10h
operator-webhook-lrpkg                    1/1     Running       0          10h
operator-webhook-tfmpr                    1/1     Running       0          10h
sriov-cni-fpwpt                           2/2     Terminating   0          4m29s
sriov-device-plugin-9z2bw                 0/1     Terminating   0          4m59s
sriov-network-config-daemon-mnlgn         1/1     Running       0          10h
sriov-network-config-daemon-xnn82         1/1     Running       0          10h
sriov-network-operator-6d54c974c8-qsj48   1/1     Running       0          10h

check the event as below. when `Started container sriov-device-plugin`, after 5 mins `Stopping container sriov-device-plugin`

66m         Normal    Created            pod/sriov-device-plugin-ttb67           Created container sriov-device-plugin
66m         Normal    Started            pod/sriov-device-plugin-ttb67           Started container sriov-device-plugin
61m         Normal    Killing            pod/sriov-device-plugin-ttb67           Stopping container sriov-device-plugin
96m         Normal    Scheduled          pod/sriov-device-plugin-vrmt2           Successfully assigned openshift-sriov-network-operator/sriov-device-plugin-vrmt2 to dell-per740-14.rhts.eng.pek2.redhat.com
92m         Warning   DNSConfigForming   pod/sriov-device-plugin-vrmt2           Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10
96m         Normal    Pulled             pod/sriov-device-plugin-vrmt2           Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:7d0fb4f1cbaa889138ad90478a8eb7ef55fb53a0a8ec08cb7c0b0eadabc19cf1" already present on machine
96m         Normal    Created            pod/sriov-device-plugin-vrmt2           Created container sriov-device-plugin
96m         Normal    Started            pod/sriov-device-plugin-vrmt2           Started container sriov-device-plugin
91m         Normal    Killing            pod/sriov-device-plugin-vrmt2           Stopping container sriov-device-plugin
79m         Normal    Scheduled          pod/sriov-device-plugin-wb66k           Successfully assigned openshift-sriov-network-operator/sriov-device-plugin-wb66k to dell-per740-14.rhts.eng.pek2.redhat.com
77m         Warning   DNSConfigForming   pod/sriov-device-plugin-wb66k           Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10
79m         Normal    Pulled             pod/sriov-device-plugin-wb66k           Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:7d0fb4f1cbaa889138ad90478a8eb7ef55fb53a0a8ec08cb7c0b0eadabc19cf1" already present on machine
79m         Normal    Created            pod/sriov-device-plugin-wb66k           Created container sriov-device-plugin
79m         Normal    Started            pod/sriov-device-plugin-wb66k           Started container sriov-device-plugin
76m         Normal    Killing            pod/sriov-device-plugin-wb66k           Stopping container sriov-device-plugin
76m         Normal    Scheduled          pod/sriov-device-plugin-znzfm           Successfully assigned openshift-sriov-network-operator/sriov-device-plugin-znzfm to dell-per740-14.rhts.eng.pek2.redhat.com
71m         Warning   DNSConfigForming   pod/sriov-device-plugin-znzfm           Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10
76m         Normal    Pulled             pod/sriov-device-plugin-znzfm           Container image "registry.redhat.io/openshift4/ose-sriov-network-device-plugin@sha256:7d0fb4f1cbaa889138ad90478a8eb7ef55fb53a0a8ec08cb7c0b0eadabc19cf1" already present on machine
76m         Normal    Created            pod/sriov-device-plugin-znzfm           Created container sriov-device-plugin
76m         Normal    Started            pod/sriov-device-plugin-znzfm           Started container sriov-device-plugin
71m         Normal    Killing            pod/sriov-device-plugin-znzfm           Stopping container sriov-device-plugin
86s         Normal    SuccessfulDelete   daemonset/sriov-device-plugin           (combined from similar events): Deleted pod: sriov-device-plugin-9z2bw
25s         Warning   DNSConfigForming   pod/sriov-network-config-daemon-mnlgn   Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10
4m11s       Warning   DNSConfigForming   pod/sriov-network-config-daemon-xnn82   Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.73.2.107 10.73.2.108 10.66.127.10

Comment 6 Federico Paolinelli 2020-11-26 12:17:57 UTC
@zzhao I opened https://bugzilla.redhat.com/show_bug.cgi?id=1901909 as the scenario of what you are seeing is different.
The 5 mins reboot happens independently of what configuration you are applying.

Comment 7 zhaozhanqi 2020-12-01 01:12:25 UTC
Thanks Federico

bug 1901909 is fixed.  So move this bug to verified.

Comment 10 errata-xmlrpc 2021-02-24 15:34:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.