Bug 1949262
| Summary: | Prometheus Statefulsets should have 2 replicas and hard affinity set | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | ravig <rgudimet> | |
| Component: | Monitoring | Assignee: | Simon Pasquier <spasquie> | |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | |
| Severity: | medium | Docs Contact: | Brian Burt <bburt> | |
| Priority: | medium | |||
| Version: | 4.8 | CC: | airshad, anpicker, bburt, cruhm, david.karlsen, dgrisonn, dofinn, erooth, inecas, jeder, kgordeev, mbargenq, mbukatov, roxenham, sdodson, spasquie, sraje, vjaypurk, wking, ychoukse | |
| Target Milestone: | --- | Keywords: | ServiceDeliveryImpact | |
| Target Release: | 4.10.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Previously, the Prometheus service would become unavailable when the two Prometheus pods were located on the same node and that node experienced an outage.
This situation occurred because the Prometheus pods had only soft anti-affinity rules regarding node placement.
Consequently, metrics would not be collected, and rules would not be evaluated until the node came back online.
With this release, the Cluster Monitoring Operator configures hard anti-affinity rules to ensure that the two Prometheus pods are scheduled on different nodes.
As a result, Prometheus pods are now scheduled on different nodes, and a single node outage no longer creates a gap in monitoring.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1957703 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:03:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1995924 | |||
| Bug Blocks: | 1957703, 1984103 | |||
|
Description
ravig
2021-04-13 19:51:26 UTC
Alertmanager needs to stay with 3 replicas with soft affinity until the statefulset resource implements MinReadySecs [1]. [1] https://github.com/kubernetes/kubernetes/issues/65098 tested with 4.8.0-0.nightly-2021-04-29-222100, hard anti-affinity to Prometheuses is added
oc -n openshift-monitoring get sts prometheus-k8s -oyaml
...
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: prometheus
operator: In
values:
- k8s
namespaces:
- openshift-monitoring
topologyKey: kubernetes.io/hostname
# oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml
...
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: prometheus
operator: In
values:
- user-workload
namespaces:
- openshift-user-workload-monitoring
topologyKey: kubernetes.io/hostname
*** Bug 1955147 has been marked as a duplicate of this bug. *** Unsetting target release for now. The path forward is to wait for bug 1974832 which adds an alert to detect workloads with persistent storage that are scheduled on the same node. With this alert, a runbook will be provided to the users to help them fix their cluster in order for us to be able to re-enable hard pod anti-affinity on hostname in 4.9. checked with 4.10.0-0.nightly-2021-11-28-164900, Prometheus Statefulsets now have 2 replicas and hard affinity set
oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s
prometheus-k8s-0 6/6 Running 0 5m36s 10.129.2.60 ip-10-0-194-46.us-east-2.compute.internal <none> <none>
prometheus-k8s-1 6/6 Running 0 5m36s 10.131.0.23 ip-10-0-129-166.us-east-2.compute.internal <none> <none>
# oc -n openshift-monitoring get sts prometheus-k8s -oyaml
...
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
...
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: openshift-monitoring
prometheus: k8s
namespaces:
- openshift-monitoring
topologyKey: kubernetes.io/hostname
# oc -n openshift-user-workload-monitoring get pod -o wide | grep prometheus-user-workload
prometheus-user-workload-0 5/5 Running 0 2m22s 10.129.2.64 ip-10-0-194-46.us-east-2.compute.internal <none> <none>
prometheus-user-workload-1 5/5 Running 0 2m22s 10.128.2.122 ip-10-0-191-20.us-east-2.compute.internal <none> <none>
# oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml
...
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
...
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: openshift-monitoring
prometheus: user-workload
namespaces:
- openshift-user-workload-monitoring
topologyKey: kubernetes.io/hostname
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |