Description of problem: Scheduler anti-affinity should be set for prometheus/alertmanager pods such that containers get evenly distributed across the available OpenShift infra nodes. If not set, monitoring pods can be scheduled on the same nodes - diminishing HA and complicating capacity analysis for the infra nodes: In the following case, both alertmanager and prometheus instances are bunched up on >>> [root@starter-us-east-1-master-25064 ~]# oc get pods -o=wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE alertmanager-main-0 3/3 Running 0 1d 10.129.6.226 ip-172-31-48-214.ec2.internal <none> alertmanager-main-1 3/3 Running 0 1d 10.129.9.124 ip-172-31-51-95.ec2.internal <none> alertmanager-main-2 3/3 Running 0 1d 10.129.6.229 ip-172-31-48-214.ec2.internal <none> ... prometheus-k8s-0 4/4 Running 0 1d 10.129.6.230 ip-172-31-48-214.ec2.internal <none> prometheus-k8s-1 4/4 Running 0 1d 10.129.6.232 ip-172-31-48-214.ec2.internal <none> prometheus-operator-579779cd5c-kn2nr 1/1 Running 0 1d 10.129.6.215 ip-172-31-48-214.ec2.internal <none> <<< Version-Release number of selected component (if applicable): 3.11.16 Expected results: pods should be distributed evenly across infra nodes. Additional info: https://github.com/openshift/openshift-ansible/blob/2ae9225b63d4ac9fcc7959e97d9932c99a20308c/roles/openshift_console/files/console-template.yaml#L69
Thanks for the hint Justin. We have fixed this upstream [1] and we will propagate this to the cluster-monitoring-operator next. [1] https://github.com/coreos/prometheus-operator/pull/1935
There is one little problem. prometheus-operator and prometheus-k8s pod will be created on the nodes with the same nodeSelector. In my testing, nodeSelector is ****************************************** nodeSelector: role: node ****************************************** There are 3 nodes labelled with role=node, they are ip-172-18-10-235.ec2.internal, ip-172-18-12-50.ec2.internal, ip-172-18-15-213.ec2.internal. But prometheus-operator-7566fcccc8-t7wc5 and prometheus-k8s-0 is created in the same node: ip-172-18-10-235.ec2.internal, no prometheus or prometheus-operator is created on node ip-172-18-12-50.ec2.internal # oc get node --show-labels | grep role=node ip-172-18-10-235.ec2.internal Ready compute 7h v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m3.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1d,kubernetes.io/hostname=ip-172-18-10-235.ec2.internal,node-role.kubernetes.io/compute=true,role=node ip-172-18-12-50.ec2.internal Ready <none> 7h v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m3.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1d,kubernetes.io/hostname=ip-172-18-12-50.ec2.internal,registry=enabled,role=node,router=enabled ip-172-18-15-213.ec2.internal Ready compute 7h v1.11.0+d4cacc0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m3.large,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1d,kubernetes.io/hostname=ip-172-18-15-213.ec2.internal,node-role.kubernetes.io/compute=true,role=node ***************************************************************** # oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE alertmanager-main-0 3/3 Running 0 7h 10.131.0.4 ip-172-18-10-235.ec2.internal <none> alertmanager-main-1 3/3 Running 0 7h 10.130.0.6 ip-172-18-12-50.ec2.internal <none> alertmanager-main-2 3/3 Running 0 7h 10.129.0.5 ip-172-18-15-213.ec2.internal <none> cluster-monitoring-operator-56bb5946c4-mzqdk 1/1 Running 0 7h 10.129.0.2 ip-172-18-15-213.ec2.internal <none> grafana-56f6875b69-ljr8k 2/2 Running 0 7h 10.129.0.3 ip-172-18-15-213.ec2.internal <none> kube-state-metrics-776f9667b-2lxdl 3/3 Running 0 7h 10.130.0.7 ip-172-18-12-50.ec2.internal <none> node-exporter-gkdt6 2/2 Running 0 7h 172.18.1.217 ip-172-18-1-217.ec2.internal <none> node-exporter-gnj2c 2/2 Running 0 7h 172.18.12.50 ip-172-18-12-50.ec2.internal <none> node-exporter-l6md2 2/2 Running 0 7h 172.18.15.213 ip-172-18-15-213.ec2.internal <none> node-exporter-wnsqd 2/2 Running 0 7h 172.18.10.235 ip-172-18-10-235.ec2.internal <none> prometheus-k8s-0 4/4 Running 1 7h 10.131.0.3 ip-172-18-10-235.ec2.internal <none> prometheus-k8s-1 4/4 Running 1 7h 10.129.0.4 ip-172-18-15-213.ec2.internal <none> prometheus-operator-7566fcccc8-t7wc5 1/1 Running 0 7h 10.131.0.2 ip-172-18-10-235.ec2.internal <none> ******************************************************************************* I think we should also add anti-affinity for prometheus-operator pod # oc -n openshift-monitoring get pod prometheus-operator-7566fcccc8-t7wc5 -oyaml | grep -i affinity nothing returned
Image: ose-prometheus-operator-v3.11.28-1 other images version is also v3.11.28-1
I don't think anti-affinity of Prometheus and Prometheus Operator have any effect. The Prometheus Operator has no issue being spun up again on a different node or same node as Prometheus, whereas two Prometheus servers being on the same node has availability concerns.
(In reply to Frederic Branczyk from comment #4) > I don't think anti-affinity of Prometheus and Prometheus Operator have any > effect. The Prometheus Operator has no issue being spun up again on a > different node or same node as Prometheus, whereas two Prometheus servers > being on the same node has availability concerns. Thanks for confirmation, please change this defect to ON_QA
Both 3.11 and 4.0 have anti-affinity on Prometheus and Alertmanager pods, moving to modified.
prometheus & alertmanager pods are use anti-affinity now, eg: prometheus-k8s-0 pod: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: prometheus operator: In values: - k8s namespaces: - openshift-monitoring cluster monitoring images: v3.11.88
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0407