Bug 1880282 - performance-addon-operator should be part of control plane, and run on master nodes
Summary: performance-addon-operator should be part of control plane, and run on master...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.6.0
Assignee: Marcel Apfelbaum
QA Contact: Gowrishankar Rajaiyan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-18 07:55 UTC by Francesco Romani
Modified: 2021-11-26 14:26 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:42:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni performance-addon-operators pull 373 0 None closed Pao run master nodes 2020-12-02 18:49:05 UTC
Github openshift-kni performance-addon-operators pull 374 0 None closed [release-4.6] Pao run master nodes 2020-12-02 18:49:05 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:42:45 UTC

Description Francesco Romani 2020-09-18 07:55:09 UTC
Description of problem:
The performance-addon-operator should be considered part of the control plane, and should run on the master nodes. Currently, this is just not enforced, and we observed the operator running on worker nodes.


Version-Release number of selected component (if applicable):
<= 4.6


How reproducible:
100%


Steps to Reproduce:
1. install performance-addon-operator, any released version

Actual results:
The operator runs on worker nodes

Expected results:
The operator should run on master nodes.

Comment 1 Robert Love 2020-09-18 14:58:29 UTC
If we cannot guarantee the cores that we promise for low latency workloads then it's a problem we should fix for the release.

Comment 2 Marcel Apfelbaum 2020-09-21 06:54:48 UTC
PAO will use https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity to ensure is scheduled to master nodes.

Comment 5 Gowrishankar Rajaiyan 2020-10-01 05:02:25 UTC
The test cluster setup has 6 nodes (3 masters & 3 workers)

[root@dell-r730-028 ~]# oc get node
NAME       STATUS   ROLES               AGE   VERSION
master-0   Ready    master              13h   v1.19.0+b4ffb45
master-1   Ready    master              13h   v1.19.0+b4ffb45
master-2   Ready    master              13h   v1.19.0+b4ffb45
worker-0   Ready    worker,worker-cnf   13h   v1.19.0+b4ffb45
worker-1   Ready    worker,worker-cnf   13h   v1.19.0+b4ffb45
worker-2   Ready    worker              13h   v1.19.0+b4ffb45
[root@dell-r730-028 ~]#


When the performance-operator is installed it respects the nodeAffinity defined in its operator spec:

<snip>
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/master
            operator: Exists
</snip>

And based on nodeAffinity, the kube-scheduler evaluates 6 nodes but filters 3 nodes (masters) as feasible for pod deployment. We can see that in openshift-kube-scheduler logs below:

[root@dell-r730-028 ~]# oc logs openshift-kube-scheduler-master-1 -n openshift-kube-scheduler | grep performance-operator
I0930 17:52:21.386853       1 scheduler.go:597] "Successfully bound pod to node" pod="openshift-performance-addon/performance-operator-d964d967f-rbw24" node="master-2" evaluatedNodes=6 feasibleNodes=3
[root@dell-r730-028 ~]#


[root@dell-r730-028 ~]# oc get pod -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP           NODE       NOMINATED NODE   READINESS GATES
performance-operator-d964d967f-rbw24   1/1     Running   0          10h   10.129.0.7   master-2   <none>           <none>
[root@dell-r730-028 ~]#


The above verifies that Performance Addon Operator is deployed on master node as expected.

Comment 8 errata-xmlrpc 2020-10-27 16:42:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.