Bug 1857581 - [4.5][sriov] sriov-device-plugin pods not scheduled to node with taints
Summary: [4.5][sriov] sriov-device-plugin pods not scheduled to node with taints
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 4.5.z
Assignee: zenghui.shi
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1857509 (view as bug list)
Depends On: 1857507
Blocks: 1857510 1858668
TreeView+ depends on / blocked
 
Reported: 2020-07-16 08:03 UTC by OpenShift BugZilla Robot
Modified: 2020-08-07 15:16 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-22 12:21:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 302 0 None closed [release-4.5] bug 1857581: Tolerates all taints in device plugin DaemonSet 2020-09-28 07:51:43 UTC
Red Hat Product Errata RHBA-2020:2956 0 None None None 2020-07-22 12:21:30 UTC

Description OpenShift BugZilla Robot 2020-07-16 08:03:42 UTC
+++ This bug was initially created as a clone of Bug #1857507 +++

Description of problem:

when a node enabled for SR-IOV is tainted, sriov-device-plugin pods are not able to get scheduled on to the node therefore preventing pods requiring SR-IOV from getting scheduled.



Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1.taint node that has SR-IOV enabled e.g. oc adm taint node worker-21 worker=load-balancer:NoSchedule   
2.reboot the node
3.sriov-device-plugin does not get scheduled on to the node

Actual results:


Expected results:
sriov-device-plugin get scheduled onto the node

Additional info:

sriov-device-plugin tolerations:

 tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists

These will not allow the pod to be scheduled.  The other sriov pods have an additional toleration:

tolerations:
  - operator: Exists

That allows them to get scheduled.

Comment 2 zenghui.shi 2020-07-17 02:31:18 UTC
*** Bug 1857509 has been marked as a duplicate of this bug. ***

Comment 5 zhaozhanqi 2020-07-20 01:39:46 UTC
Verified this bug on 4.5.0-202007172106.p0

 oc rsh sriov-network-operator-54df58fd7b-hdv4g
sh-4.2#cat bindata/manifests/plugins/sriov-device-plugin.yaml | grep toler -A 3
      tolerations:
      - operator: Exists
      serviceAccountName: sriov-device-plugin
      containers:

Comment 7 errata-xmlrpc 2020-07-22 12:21:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956


Note You need to log in before you can comment on or make changes to this bug.