Description of problem: prometheus-adapter pods are stuck in "Pending" status, it can't be scheduled to node for the mis-match of pod anti-affinity rules. Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-04-18-101412 How reproducible: Intermittently Steps to Reproduce: 1.deploy a SNO cluster 2.wait several hours, execute #oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-18-101412 True False 2d5h Error while reconciling 4.8.0-0.nightly-2021-04-18-101412: the cluster operator monitoring has not yet successfully rolled out 3.# oc get co monitoring -o yaml ... status: conditions: - lastTransitionTime: "2021-04-21T03:00:13Z" message: 'Failed to rollout the stack. Error: running task Updating prometheus-adapter failed: reconciling PrometheusAdapter Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter: expected 2 replicas, got 1 updated replicas' reason: UpdatingprometheusAdapterFailed status: "True" type: Degraded 4.# oc get pod -n openshift-monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 5/5 Running 0 2d5h cluster-monitoring-operator-655fd6d7d7-9w4zl 2/2 Running 3 2d5h grafana-596c899699-lj2np 2/2 Running 0 17h kube-state-metrics-fb6d9bb6-fjfd4 3/3 Running 0 2d5h node-exporter-z8mv4 2/2 Running 0 2d5h openshift-state-metrics-95cd8f6c4-jtjwk 3/3 Running 0 2d5h prometheus-adapter-855bb6d448-8fbmm 1/1 Running 0 2d5h prometheus-adapter-b78465c69-rcrzg 0/1 Pending 0 5h48m prometheus-k8s-0 7/7 Running 1 2d5h prometheus-operator-6db86854db-f4mdp 2/2 Running 0 2d5h telemeter-client-5859546f89-ctw2v 3/3 Running 0 17h thanos-querier-8545ddf675-58sfz 5/5 Running 0 2d5h 5. oc get pod prometheus-adapter-b78465c69-rcrzg -o yaml -n openshift-monitoring ... status: conditions: - lastProbeTime: null lastTransitionTime: "2021-04-21T02:34:28Z" message: '0/1 nodes are available: 1 node(s) didn''t match pod affinity/anti-affinity rules, 1 node(s) didn''t match pod anti-affinity rules.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending qosClass: Burstable Actual results: 5.the prometheus-adapter-b78465c69-rcrzg pod can't be scheduled to node, for there is only one node, and the other prometheus-adapter pod has been running on this node, it mis-match pod anti-affinity rule. Expected results: Additional info: I am not sure if it is acceptable that only one prometheus-adapter pod running in sno cluster, because there should be 2 prometheus-adapter pods by design, and on the other hand, the Pending pod will block upgrade process of sno.
also I think the prompt message is incorrect and misspelling (didn''t), there is only one node, so it should say: '0/1 nodes are available: , 1 node(s) didn't match pod anti-affinity rules.'
Issue is fixed in payload 4.8.0-0.nightly-2021-04-19-121657 Duplicate as bug https://bugzilla.redhat.com/show_bug.cgi?id=1950761
Issue is fixed in payload 4.8.0-0.nightly-2021-04-19-121657 Duplicate as bug https://bugzilla.redhat.com/show_bug.cgi?id=1950761 *** This bug has been marked as a duplicate of bug 1950761 ***