Bug 2039411
| Summary: | Monitoring operator reports unavailable=true while one Prometheus pod is ready | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Pasquier <spasquie> |
| Component: | Monitoring | Assignee: | Sunil Thaha <sthaha> |
| Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
| Severity: | medium | Docs Contact: | Brian Burt <bburt> |
| Priority: | high | ||
| Version: | 4.6 | CC: | anpicker, aos-bugs, bburt, hongyli, juzhao, sthaha, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.12.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
* Before this update, if the Cluster Monitoring Operator (CMO) failed to update Prometheus, the CMO did not verify whether a previous deployment was running and would report that cluster monitoring was unavailable even if one of the Prometheus pods was still running. With this update, the CMO now checks for running Prometheus pods in this situation and reports that cluster monitoring is unavailable only if no Prometheus pods are running.
(link:https://bugzilla.redhat.com/show_bug.cgi?id=2039411[*BZ#2039411*])
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-17 19:46:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Simon Pasquier
2022-01-11 16:54:09 UTC
Take the bug as the bug has same pr as https://bugzilla.redhat.com/show_bug.cgi?id=2043518 Test with pr
for a 3 worknode cluster, taint 2 work nodes
% oc adm taint nodes <node-name> prometheus:NoSchedule
% oc -n openshift-monitoring get pod|grep prometheus-k8s
prometheus-k8s-0 6/6 Running 0 33m
prometheus-k8s-1 0/6 Pending 0 13m
% oc get co monitoring
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
monitoring 4.12.0-0.ci.test-2022-08-25-065719-ci-ln-pjcbf32-latest True False True 95s SomePodsNotReady: shard 0: pod prometheus-k8s-1: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) had untolerated taint {prometheus: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) had no available volume zone, 4 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Changed the bug as verified for the PR is tested and merged Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |