Description of problem: The performance-addon-operator should be considered part of the control plane, and should run on the master nodes. Currently, this is just not enforced, and we observed the operator running on worker nodes. Version-Release number of selected component (if applicable): <= 4.6 How reproducible: 100% Steps to Reproduce: 1. install performance-addon-operator, any released version Actual results: The operator runs on worker nodes Expected results: The operator should run on master nodes.
If we cannot guarantee the cores that we promise for low latency workloads then it's a problem we should fix for the release.
PAO will use https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity to ensure is scheduled to master nodes.
PR: https://github.com/openshift-kni/performance-addon-operators/pull/351
The test cluster setup has 6 nodes (3 masters & 3 workers) [root@dell-r730-028 ~]# oc get node NAME STATUS ROLES AGE VERSION master-0 Ready master 13h v1.19.0+b4ffb45 master-1 Ready master 13h v1.19.0+b4ffb45 master-2 Ready master 13h v1.19.0+b4ffb45 worker-0 Ready worker,worker-cnf 13h v1.19.0+b4ffb45 worker-1 Ready worker,worker-cnf 13h v1.19.0+b4ffb45 worker-2 Ready worker 13h v1.19.0+b4ffb45 [root@dell-r730-028 ~]# When the performance-operator is installed it respects the nodeAffinity defined in its operator spec: <snip> nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/master operator: Exists </snip> And based on nodeAffinity, the kube-scheduler evaluates 6 nodes but filters 3 nodes (masters) as feasible for pod deployment. We can see that in openshift-kube-scheduler logs below: [root@dell-r730-028 ~]# oc logs openshift-kube-scheduler-master-1 -n openshift-kube-scheduler | grep performance-operator I0930 17:52:21.386853 1 scheduler.go:597] "Successfully bound pod to node" pod="openshift-performance-addon/performance-operator-d964d967f-rbw24" node="master-2" evaluatedNodes=6 feasibleNodes=3 [root@dell-r730-028 ~]# [root@dell-r730-028 ~]# oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES performance-operator-d964d967f-rbw24 1/1 Running 0 10h 10.129.0.7 master-2 <none> <none> [root@dell-r730-028 ~]# The above verifies that Performance Addon Operator is deployed on master node as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196