Description of problem: The monitoring operator always goes "available=false/degraded=true" when one of the reconciliation tasks fails. In some cases it should Version-Release number of selected component (if applicable): 4.10 and before How reproducible: Always Steps to Reproduce: 1. Mark the node running prometheus-k8s-1 has not schedulable 2. Trigger a rollout of the pod (delete the pod?) 3. Wait for the monitoring operator to change its conditions. Actual results: The operator reports "available=false/degraded=true". Expected results: The operator should report "available=true/degraded=true" because the other prometheus pod is up and running. Additional info: https://coreos.slack.com/archives/C0VMT03S5/p1641918136031400
Take the bug as the bug has same pr as https://bugzilla.redhat.com/show_bug.cgi?id=2043518
Test with pr for a 3 worknode cluster, taint 2 work nodes % oc adm taint nodes <node-name> prometheus:NoSchedule % oc -n openshift-monitoring get pod|grep prometheus-k8s prometheus-k8s-0 6/6 Running 0 33m prometheus-k8s-1 0/6 Pending 0 13m % oc get co monitoring NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE monitoring 4.12.0-0.ci.test-2022-08-25-065719-ci-ln-pjcbf32-latest True False True 95s SomePodsNotReady: shard 0: pod prometheus-k8s-1: 0/6 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) had untolerated taint {prometheus: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) had no available volume zone, 4 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Changed the bug as verified for the PR is tested and merged
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399