Bug 1952762 - SNO: prometheus-adapter pods are stuck in "Pending" status
Summary: SNO: prometheus-adapter pods are stuck in "Pending" status
Keywords:
Status: CLOSED DUPLICATE of bug 1950761
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Sergiusz Urbaniak
QA Contact: hongyan li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-23 04:11 UTC by MinLi
Modified: 2021-04-23 07:26 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-23 05:51:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description MinLi 2021-04-23 04:11:08 UTC
Description of problem:
prometheus-adapter pods are stuck in "Pending" status, it can't be scheduled to node for the mis-match of pod anti-affinity rules. 

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-18-101412

How reproducible:
Intermittently

Steps to Reproduce:
1.deploy a SNO cluster

2.wait several hours, execute #oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-04-18-101412   True        False         2d5h    Error while reconciling 4.8.0-0.nightly-2021-04-18-101412: the cluster operator monitoring has not yet successfully rolled out

3.# oc get co monitoring -o yaml 
...
status:
  conditions:
  - lastTransitionTime: "2021-04-21T03:00:13Z"
    message: 'Failed to rollout the stack. Error: running task Updating prometheus-adapter
      failed: reconciling PrometheusAdapter Deployment failed: updating Deployment
      object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter:
      expected 2 replicas, got 1 updated replicas'
    reason: UpdatingprometheusAdapterFailed
    status: "True"
    type: Degraded

4.# oc get pod -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          2d5h
cluster-monitoring-operator-655fd6d7d7-9w4zl   2/2     Running   3          2d5h
grafana-596c899699-lj2np                       2/2     Running   0          17h
kube-state-metrics-fb6d9bb6-fjfd4              3/3     Running   0          2d5h
node-exporter-z8mv4                            2/2     Running   0          2d5h
openshift-state-metrics-95cd8f6c4-jtjwk        3/3     Running   0          2d5h
prometheus-adapter-855bb6d448-8fbmm            1/1     Running   0          2d5h
prometheus-adapter-b78465c69-rcrzg             0/1     Pending   0         5h48m
prometheus-k8s-0                               7/7     Running   1          2d5h
prometheus-operator-6db86854db-f4mdp           2/2     Running   0          2d5h
telemeter-client-5859546f89-ctw2v              3/3     Running   0          17h
thanos-querier-8545ddf675-58sfz                5/5     Running   0          2d5h

5. oc get pod prometheus-adapter-b78465c69-rcrzg -o yaml -n openshift-monitoring
...
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-04-21T02:34:28Z"
    message: '0/1 nodes are available: 1 node(s) didn''t match pod affinity/anti-affinity
      rules, 1 node(s) didn''t match pod anti-affinity rules.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable


Actual results:
5.the prometheus-adapter-b78465c69-rcrzg pod can't be scheduled to node, for there is only one node, and the other prometheus-adapter pod has been running on this node, it mis-match pod anti-affinity rule. 

Expected results:


Additional info:
I am not sure if it is acceptable that only one prometheus-adapter pod running in sno cluster, because there should be 2 prometheus-adapter pods by design, and on the other hand, the Pending pod will block upgrade process of sno.

Comment 1 MinLi 2021-04-23 04:18:34 UTC
also I think the prompt message is incorrect and misspelling (didn''t), there is only one node, so it should say: 
'0/1 nodes are available: , 1 node(s) didn't match pod anti-affinity rules.'

Comment 2 hongyan li 2021-04-23 05:50:44 UTC
Issue is fixed in payload 4.8.0-0.nightly-2021-04-19-121657
Duplicate as bug https://bugzilla.redhat.com/show_bug.cgi?id=1950761

Comment 3 hongyan li 2021-04-23 05:51:26 UTC
Issue is fixed in payload 4.8.0-0.nightly-2021-04-19-121657
Duplicate as bug https://bugzilla.redhat.com/show_bug.cgi?id=1950761

*** This bug has been marked as a duplicate of bug 1950761 ***


Note You need to log in before you can comment on or make changes to this bug.