1949262 – Prometheus Statefulsets should have 2 replicas and hard affinity set

Bug 1949262 - Prometheus Statefulsets should have 2 replicas and hard affinity set

Summary: Prometheus Statefulsets should have 2 replicas and hard affinity set

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:	Brian Burt
URL:
Whiteboard:
Duplicates (1):	1955147 (view as bug list)
Depends On:	1995924
Blocks:	1957703 1984103
TreeView+	depends on / blocked

Reported:	2021-04-13 19:51 UTC by ravig
Modified:	2024-12-20 19:53 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, the Prometheus service would become unavailable when the two Prometheus pods were located on the same node and that node experienced an outage. This situation occurred because the Prometheus pods had only soft anti-affinity rules regarding node placement. Consequently, metrics would not be collected, and rules would not be evaluated until the node came back online. With this release, the Cluster Monitoring Operator configures hard anti-affinity rules to ensure that the two Prometheus pods are scheduled on different nodes. As a result, Prometheus pods are now scheduled on different nodes, and a single node outage no longer creates a gap in monitoring.
Clone Of:
Clones:	1957703 (view as bug list)
Environment:
Last Closed:	2022-03-10 16:03:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1135	None	open	Bug 1949262: jsonnet: add hard anti-affinity to Prometheuses	2021-04-27 13:47:23 UTC
Github	openshift cluster-monitoring-operator pull 1341	None	open	Bug 1933847: enable hard affinity + PodDisruptionBudget for Prometheus and Thanos Ruler pods	2021-10-27 13:56:30 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:03:35 UTC

Description ravig 2021-04-13 19:51:26 UTC

Description of problem:
As mentioned in the conventions doc https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability, both prometheus and alertmanager should have replica count of 2 with hard affinities set till we bring descheduler into our product.
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Simon Pasquier 2021-04-14 09:45:31 UTC

Alertmanager needs to stay with 3 replicas with soft affinity until the statefulset resource implements MinReadySecs [1].

[1] https://github.com/kubernetes/kubernetes/issues/65098

Comment 7 Junqi Zhao 2021-04-30 06:53:41 UTC

tested with 4.8.0-0.nightly-2021-04-29-222100, hard anti-affinity to Prometheuses is added
 oc -n openshift-monitoring get sts prometheus-k8s -oyaml
...
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: prometheus
                operator: In
                values:
                - k8s
            namespaces:
            - openshift-monitoring
            topologyKey: kubernetes.io/hostname

# oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml
...
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: prometheus
                operator: In
                values:
                - user-workload
            namespaces:
            - openshift-user-workload-monitoring
            topologyKey: kubernetes.io/hostname

Comment 9 Jeremy Eder 2021-05-04 19:23:21 UTC

*** Bug 1955147 has been marked as a duplicate of this bug. ***

Comment 13 Simon Pasquier 2021-06-08 14:44:16 UTC

Unsetting target release for now.

Comment 15 Damien Grisonnet 2021-07-02 17:27:00 UTC

The path forward is to wait for bug 1974832 which adds an alert to detect workloads with persistent storage that are scheduled on the same node. With this alert, a runbook will be provided to the users to help them fix their cluster in order for us to be able to re-enable hard pod anti-affinity on hostname in 4.9.

Comment 33 Simon Pasquier 2021-11-23 07:55:44 UTC

https://github.com/openshift/cluster-monitoring-operator/pull/1341 has been merged

Comment 34 Junqi Zhao 2021-11-29 06:28:06 UTC

checked with 4.10.0-0.nightly-2021-11-28-164900, Prometheus Statefulsets now have 2 replicas and hard affinity set
 oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s
prometheus-k8s-0                              6/6     Running   0          5m36s   10.129.2.60    ip-10-0-194-46.us-east-2.compute.internal    <none>           <none>
prometheus-k8s-1                              6/6     Running   0          5m36s   10.131.0.23    ip-10-0-129-166.us-east-2.compute.internal   <none>           <none>

# oc -n openshift-monitoring get sts prometheus-k8s -oyaml
...
spec:
  podManagementPolicy: Parallel
  replicas: 2
  revisionHistoryLimit: 10
...
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/component: prometheus
                app.kubernetes.io/name: prometheus
                app.kubernetes.io/part-of: openshift-monitoring
                prometheus: k8s
            namespaces:
            - openshift-monitoring
            topologyKey: kubernetes.io/hostname

# oc -n openshift-user-workload-monitoring get pod -o wide | grep prometheus-user-workload
prometheus-user-workload-0            5/5     Running   0          2m22s   10.129.2.64    ip-10-0-194-46.us-east-2.compute.internal    <none>           <none>
prometheus-user-workload-1            5/5     Running   0          2m22s   10.128.2.122   ip-10-0-191-20.us-east-2.compute.internal    <none>           <none>

# oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml
...
spec:
  podManagementPolicy: Parallel
  replicas: 2
  revisionHistoryLimit: 10
...
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/component: prometheus
                app.kubernetes.io/name: prometheus
                app.kubernetes.io/part-of: openshift-monitoring
                prometheus: user-workload
            namespaces:
            - openshift-user-workload-monitoring
            topologyKey: kubernetes.io/hostname

Comment 41 errata-xmlrpc 2022-03-10 16:03:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.

airshad
anpicker
bburt
cruhm
david.karlsen
dgrisonn
dofinn
erooth
inecas
jeder
kgordeev
mbargenq
mbukatov
roxenham
sdodson
spasquie
sraje
vjaypurk
wking
ychoukse