1952762 – SNO: prometheus-adapter pods are stuck in "Pending" status

Bug 1952762 - SNO: prometheus-adapter pods are stuck in "Pending" status

Summary: SNO: prometheus-adapter pods are stuck in "Pending" status

Keywords:
Status:	CLOSED DUPLICATE of bug 1950761
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sergiusz Urbaniak
QA Contact:	hongyan li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-23 04:11 UTC by MinLi
Modified:	2021-04-23 07:26 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-23 05:51:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description MinLi 2021-04-23 04:11:08 UTC

Description of problem:
prometheus-adapter pods are stuck in "Pending" status, it can't be scheduled to node for the mis-match of pod anti-affinity rules. 

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-18-101412

How reproducible:
Intermittently

Steps to Reproduce:
1.deploy a SNO cluster

2.wait several hours, execute #oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-18-101412   True        False         2d5h    Error while reconciling 4.8.0-0.nightly-2021-04-18-101412: the cluster operator monitoring has not yet successfully rolled out

3.# oc get co monitoring -o yaml 
...
status:
  conditions:
  - lastTransitionTime: "2021-04-21T03:00:13Z"
    message: 'Failed to rollout the stack. Error: running task Updating prometheus-adapter
      failed: reconciling PrometheusAdapter Deployment failed: updating Deployment
      object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter:
      expected 2 replicas, got 1 updated replicas'
    reason: UpdatingprometheusAdapterFailed
    status: "True"
    type: Degraded

4.# oc get pod -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          2d5h
cluster-monitoring-operator-655fd6d7d7-9w4zl   2/2     Running   3          2d5h
grafana-596c899699-lj2np                       2/2     Running   0          17h
kube-state-metrics-fb6d9bb6-fjfd4              3/3     Running   0          2d5h
node-exporter-z8mv4                            2/2     Running   0          2d5h
openshift-state-metrics-95cd8f6c4-jtjwk        3/3     Running   0          2d5h
prometheus-adapter-855bb6d448-8fbmm            1/1     Running   0          2d5h
prometheus-adapter-b78465c69-rcrzg             0/1     Pending   0         5h48m
prometheus-k8s-0                               7/7     Running   1          2d5h
prometheus-operator-6db86854db-f4mdp           2/2     Running   0          2d5h
telemeter-client-5859546f89-ctw2v              3/3     Running   0          17h
thanos-querier-8545ddf675-58sfz                5/5     Running   0          2d5h

5. oc get pod prometheus-adapter-b78465c69-rcrzg -o yaml -n openshift-monitoring
...
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-04-21T02:34:28Z"
    message: '0/1 nodes are available: 1 node(s) didn''t match pod affinity/anti-affinity
      rules, 1 node(s) didn''t match pod anti-affinity rules.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable


Actual results:
5.the prometheus-adapter-b78465c69-rcrzg pod can't be scheduled to node, for there is only one node, and the other prometheus-adapter pod has been running on this node, it mis-match pod anti-affinity rule. 

Expected results:


Additional info:
I am not sure if it is acceptable that only one prometheus-adapter pod running in sno cluster, because there should be 2 prometheus-adapter pods by design, and on the other hand, the Pending pod will block upgrade process of sno.

Comment 1 MinLi 2021-04-23 04:18:34 UTC

also I think the prompt message is incorrect and misspelling (didn''t), there is only one node, so it should say: 
'0/1 nodes are available: , 1 node(s) didn't match pod anti-affinity rules.'

Comment 2 hongyan li 2021-04-23 05:50:44 UTC

Issue is fixed in payload 4.8.0-0.nightly-2021-04-19-121657
Duplicate as bug https://bugzilla.redhat.com/show_bug.cgi?id=1950761

Comment 3 hongyan li 2021-04-23 05:51:26 UTC

Issue is fixed in payload 4.8.0-0.nightly-2021-04-19-121657
Duplicate as bug https://bugzilla.redhat.com/show_bug.cgi?id=1950761

*** This bug has been marked as a duplicate of bug 1950761 ***

Note You need to log in before you can comment on or make changes to this bug.