Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1816803

Summary: Failed to rollout the stack error when upgrading to 4.3.8
Product: OpenShift Container Platform Reporter: Sam Yangsao <syangsao>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-30 12:46:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sam Yangsao 2020-03-24 19:04:08 UTC
Description of problem:

Failed to rollout the stack. Error: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus: expected 2 replicas, updated 0 and available 1

Version-Release number of selected component (if applicable):

4.3.8

How reproducible:

Unsure

Steps to Reproduce:

1.  Install new instance of OCP 4.3.1 on VMware
2.  Upgrade to OCP 4.3.8 on VMware
3.

Actual results:

UI shows a `Degraded` status with the following message for the `monitoring` namespace:

Failed to rollout the stack. Error: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus: expected 2 replicas, updated 0 and available 1

Expected results:

Monitoring stack should rollout if it was working fine prior to the upgrade.

Additional info:

# oc get all
NAME                                               READY   STATUS    RESTARTS   AGE
pod/alertmanager-main-0                            3/3     Running   0          44m
pod/alertmanager-main-1                            3/3     Running   0          47m
pod/alertmanager-main-2                            3/3     Running   0          47m
pod/cluster-monitoring-operator-65dc45d777-qh6zv   1/1     Running   0          56m
pod/grafana-75899bb875-5hx5v                       2/2     Running   0          42m
pod/kube-state-metrics-69bdc4f895-68pmj            3/3     Running   0          48m
pod/node-exporter-4xpwx                            2/2     Running   0          47m
pod/node-exporter-9rmz6                            2/2     Running   0          45m
pod/node-exporter-btc2h                            2/2     Running   0          46m
pod/node-exporter-kvqlk                            2/2     Running   0          44m
pod/node-exporter-nrfzh                            2/2     Running   0          48m
pod/openshift-state-metrics-7fbdf5c5fc-hnks5       3/3     Running   0          53m
pod/prometheus-adapter-649f99457c-4pr6b            1/1     Running   0          42m
pod/prometheus-adapter-649f99457c-wfwwz            1/1     Running   0          42m
pod/prometheus-k8s-0                               7/7     Running   1          44m
pod/prometheus-k8s-1                               0/7     Pending   0          43m
pod/prometheus-operator-6bf6ddd5bb-w5p7r           1/1     Running   0          48m
pod/telemeter-client-67875878c6-xknk4              3/3     Running   0          48m
pod/thanos-querier-79f488c584-frfts                4/4     Running   0          52m
pod/thanos-querier-79f488c584-p5k65                4/4     Running   0          50m

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-main             ClusterIP   172.30.233.23    <none>        9094/TCP                     4d20h
service/alertmanager-operated         ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   4d20h
service/cluster-monitoring-operator   ClusterIP   None             <none>        8080/TCP                     4d20h
service/grafana                       ClusterIP   172.30.162.42    <none>        3000/TCP                     4d20h
service/kube-state-metrics            ClusterIP   None             <none>        8443/TCP,9443/TCP            4d20h
service/node-exporter                 ClusterIP   None             <none>        9100/TCP                     4d20h
service/openshift-state-metrics       ClusterIP   None             <none>        8443/TCP,9443/TCP            4d20h
service/prometheus-adapter            ClusterIP   172.30.209.162   <none>        443/TCP                      4d20h
service/prometheus-k8s                ClusterIP   172.30.107.158   <none>        9091/TCP,9092/TCP            4d20h
service/prometheus-operated           ClusterIP   None             <none>        9090/TCP,10901/TCP           4d20h
service/prometheus-operator           ClusterIP   None             <none>        8080/TCP                     4d20h
service/telemeter-client              ClusterIP   None             <none>        8443/TCP                     4h15m
service/thanos-querier                ClusterIP   172.30.41.100    <none>        9091/TCP,9092/TCP            4d20h

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   5         5         5       5            5           kubernetes.io/os=linux   4d20h

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-monitoring-operator   1/1     1            1           4d21h
deployment.apps/grafana                       1/1     1            1           4d20h
deployment.apps/kube-state-metrics            1/1     1            1           4d20h
deployment.apps/openshift-state-metrics       1/1     1            1           4d20h
deployment.apps/prometheus-adapter            2/2     2            2           4d20h
deployment.apps/prometheus-operator           1/1     1            1           4d20h
deployment.apps/telemeter-client              1/1     1            1           4h15m
deployment.apps/thanos-querier                2/2     2            2           4d20h

NAME                                                     DESIRED   CURRENT   READY   AGE
replicaset.apps/cluster-monitoring-operator-55d99d5ffc   0         0         0       4d21h
replicaset.apps/cluster-monitoring-operator-65dc45d777   1         1         1       56m
replicaset.apps/grafana-558754658                        0         0         0       48m
replicaset.apps/grafana-59c694f446                       0         0         0       4d20h
replicaset.apps/grafana-66dff46d7c                       0         0         0       3d12h
replicaset.apps/grafana-75899bb875                       1         1         1       53m
replicaset.apps/grafana-866944cdb5                       0         0         0       4d20h
replicaset.apps/kube-state-metrics-5745cc99f5            0         0         0       4d20h
replicaset.apps/kube-state-metrics-69bdc4f895            1         1         1       48m
replicaset.apps/openshift-state-metrics-7fbdf5c5fc       1         1         1       53m
replicaset.apps/openshift-state-metrics-86bbf546f        0         0         0       4d20h
replicaset.apps/prometheus-adapter-57c7596dc8            0         0         0       4d5h
replicaset.apps/prometheus-adapter-5b7ff9944c            0         0         0       4d6h
replicaset.apps/prometheus-adapter-5fd87487f4            0         0         0       4d20h
replicaset.apps/prometheus-adapter-649f99457c            2         2         2       42m
replicaset.apps/prometheus-adapter-9ccb55f69             0         0         0       4d1h
replicaset.apps/prometheus-operator-5c96664896           0         0         0       4d20h
replicaset.apps/prometheus-operator-6bf6ddd5bb           1         1         1       48m
replicaset.apps/prometheus-operator-c455cd84b            0         0         0       4d20h
replicaset.apps/telemeter-client-67875878c6              1         1         1       48m
replicaset.apps/telemeter-client-754b4c4b88              0         0         0       4h15m
replicaset.apps/thanos-querier-5d9d856f7b                0         0         0       4d20h
replicaset.apps/thanos-querier-6b58d7c5b                 0         0         0       4d20h
replicaset.apps/thanos-querier-79f488c584                2         2         2       52m

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   3/3     4d20h
statefulset.apps/prometheus-k8s      1/2     4d20h

NAME                                         HOST/PORT                                                                PATH   SERVICES            PORT    TERMINATION          WILDCARD
route.route.openshift.io/alertmanager-main   alertmanager-main-openshift-monitoring.apps.devocp4.lab.msp.redhat.com          alertmanager-main   web     reencrypt/Redirect   None
route.route.openshift.io/grafana             grafana-openshift-monitoring.apps.devocp4.lab.msp.redhat.com                    grafana             https   reencrypt/Redirect   None
route.route.openshift.io/prometheus-k8s      prometheus-k8s-openshift-monitoring.apps.devocp4.lab.msp.redhat.com             prometheus-k8s      web     reencrypt/Redirect   None
route.route.openshift.io/thanos-querier      thanos-querier-openshift-monitoring.apps.devocp4.lab.msp.redhat.com             thanos-querier      web     reencrypt/Redirect   None

Comment 3 Sam Yangsao 2020-03-25 16:50:06 UTC
Ouput from problematic pod:

# oc -n openshift-monitoring describe pod/prometheus-k8s-1
Name:                 prometheus-k8s-1
Namespace:            openshift-monitoring
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 <none>
Labels:               app=prometheus
                      controller-revision-hash=prometheus-k8s-6d5975d66
                      prometheus=k8s
                      statefulset.kubernetes.io/pod-name=prometheus-k8s-1
Annotations:          openshift.io/scc: restricted
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        StatefulSet/prometheus-k8s
Containers:
  prometheus:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:62f7a5e596028e36518adec7d5228c062fb515bb8395eaef3aa3914891fbee21
    Port:       <none>
    Host Port:  <none>
    Args:
      --web.console.templates=/etc/prometheus/consoles
      --web.console.libraries=/etc/prometheus/console_libraries
      --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      --storage.tsdb.path=/prometheus
      --storage.tsdb.retention.time=15d
      --web.enable-lifecycle
      --storage.tsdb.no-lockfile
      --web.external-url=https://prometheus-k8s-openshift-monitoring.apps.devocp4.lab.msp.redhat.com/
      --web.route-prefix=/
      --web.listen-address=127.0.0.1:9090
    Requests:
      cpu:        200m
      memory:     1Gi
    Liveness:     exec [sh -c if [ -x "$(command -v curl)" ]; then curl http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then wget -q http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6
    Readiness:    exec [sh -c if [ -x "$(command -v curl)" ]; then curl http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then wget -q http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=120
    Environment:  <none>
    Mounts:
      /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro)
      /etc/prometheus/certs from tls-assets (ro)
      /etc/prometheus/config_out from config-out (ro)
      /etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro)
      /etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro)
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /etc/prometheus/secrets/kube-etcd-client-certs from secret-kube-etcd-client-certs (ro)
      /etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
      /etc/prometheus/secrets/prometheus-k8s-htpasswd from secret-prometheus-k8s-htpasswd (ro)
      /etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro)
      /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro)
      /prometheus from prometheus-k8s-db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
  prometheus-config-reloader:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:083e5d6e7379d4043488a6093009cd2e4f19f7bdaf1a197b3daf45e0603f68ce
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/prometheus-config-reloader
    Args:
      --log-format=logfmt
      --reload-url=http://localhost:9090/-/reload
      --config-file=/etc/prometheus/config/prometheus.yaml.gz
      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
    Limits:
      cpu:     100m
      memory:  25Mi
    Requests:
      cpu:     100m
      memory:  25Mi
    Environment:
      POD_NAME:  prometheus-k8s-1 (v1:metadata.name)
    Mounts:
      /etc/prometheus/config from config (rw)
      /etc/prometheus/config_out from config-out (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
  rules-configmap-reloader:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:46276473e250109b4521803aedd54946cc40498b99fcbedce34a1931f0fb6ebf
    Port:       <none>
    Host Port:  <none>
    Args:
      --webhook-url=http://localhost:9090/-/reload
      --volume-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
    Limits:
      cpu:     100m
      memory:  25Mi
    Requests:
      cpu:        100m
      memory:     25Mi
    Environment:  <none>
    Mounts:
      /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
  thanos-sidecar:
    Image:       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e697e0ba41815ade9ab7cd05c275b9f6730761355435ace189b9119f89d81238
    Ports:       10902/TCP, 10901/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      sidecar
      --prometheus.url=http://localhost:9090/
      --tsdb.path=/prometheus
      --grpc-address=[$(POD_IP)]:10901
      --http-address=127.0.0.1:10902
      --grpc-server-tls-cert=/etc/tls/grpc/server.crt
      --grpc-server-tls-key=/etc/tls/grpc/server.key
      --grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt
    Requests:
      cpu:     50m
      memory:  100Mi
    Environment:
      POD_IP:   (v1:status.podIP)
    Mounts:
      /etc/tls/grpc from secret-grpc-tls (rw)
      /prometheus from prometheus-k8s-db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
  prometheus-proxy:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:443087dca8e7ca16a06771c4e711145b56ddb1eacf0fdf3f0193bf9d7dc30a37
    Port:       9091/TCP
    Host Port:  0/TCP
    Args:
      -provider=openshift
      -https-address=:9091
      -http-address=
      -email-domain=*
      -upstream=http://localhost:9090
      -htpasswd-file=/etc/proxy/htpasswd/auth
      -openshift-service-account=prometheus-k8s
      -openshift-sar={"resource": "namespaces", "verb": "get"}
      -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
      -tls-cert=/etc/tls/private/tls.crt
      -tls-key=/etc/tls/private/tls.key
      -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
      -cookie-secret-file=/etc/proxy/secrets/session_secret
      -openshift-ca=/etc/pki/tls/cert.pem
      -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      -skip-auth-regex=^/metrics
    Requests:
      cpu:     10m
      memory:  20Mi
    Environment:
      HTTP_PROXY:   
      HTTPS_PROXY:  
      NO_PROXY:     
    Mounts:
      /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro)
      /etc/proxy/htpasswd from secret-prometheus-k8s-htpasswd (rw)
      /etc/proxy/secrets from secret-prometheus-k8s-proxy (rw)
      /etc/tls/private from secret-prometheus-k8s-tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
  kube-rbac-proxy:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edbacf69d159af2598f9c5f626493c35a29d65f7f2d2c212f168001a719d8c31
    Port:       9092/TCP
    Host Port:  0/TCP
    Args:
      --secure-listen-address=0.0.0.0:9092
      --upstream=http://127.0.0.1:9095
      --config-file=/etc/kube-rbac-proxy/config.yaml
      --tls-cert-file=/etc/tls/private/tls.crt
      --tls-private-key-file=/etc/tls/private/tls.key
      --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
      --logtostderr=true
      --v=10
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
      /etc/tls/private from secret-prometheus-k8s-tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
  prom-label-proxy:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:431d605858b2b0e03ddad79854dded6176ae93a89c53b7a92f599b797cb01634
    Port:       <none>
    Host Port:  <none>
    Args:
      --insecure-listen-address=127.0.0.1:9095
      --upstream=http://127.0.0.1:9090
      --label=namespace
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-s8psx (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s
    Optional:    false
  tls-assets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-tls-assets
    Optional:    false
  config-out:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  prometheus-k8s-rulefiles-0:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-k8s-rulefiles-0
    Optional:  false
  secret-kube-etcd-client-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-etcd-client-certs
    Optional:    false
  secret-prometheus-k8s-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-tls
    Optional:    false
  secret-prometheus-k8s-proxy:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-proxy
    Optional:    false
  secret-prometheus-k8s-htpasswd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-htpasswd
    Optional:    false
  secret-kube-rbac-proxy:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-rbac-proxy
    Optional:    false
  configmap-serving-certs-ca-bundle:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      serving-certs-ca-bundle
    Optional:  false
  configmap-kubelet-serving-ca-bundle:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kubelet-serving-ca-bundle
    Optional:  false
  prometheus-k8s-db:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  secret-grpc-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-grpc-tls-8r79ud0kfmcr2
    Optional:    false
  prometheus-trusted-ca-bundle:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prometheus-trusted-ca-bundle-39man1pbaa8jq
    Optional:  true
  prometheus-k8s-token-s8psx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  prometheus-k8s-token-s8psx
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.

Comment 8 Sam Yangsao 2020-03-26 16:44:19 UTC
One workaround that I validated (Thanks to Engineering) was to add an additional worker node to the cluster as it sits awaiting for this to be fixed during the upgrade.  

The cluster was originally configured with the default of:

3 master nodes (control plane) with 4 vCPU & 16 GB memory each
2 worker nodes with 2 vCPU & 8 GB memory each

This minimum configuration [1] will fail during an upgrade from 4.3.1 as the limits have changed and there is backport [2] created made to reduce the CPU requirement.

Here is an output of the problematic pod's yaml `prometheus-k8s-1.yaml`:

<snip>

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-24T18:19:32Z"
    message: '0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that
      the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

</snip>

Another way to view this message is by running the following command to see the error:

# oc -n openshift-monitoring describe pod/prometheus-k8s-1

NOTE:  prometheus-k8s pods requires 480m cpu for all containers, so the default minimums [2] will need to be adjusted.

Once I added an additional worker with more vCPU & memory, the prometheus pods started and the upgrade to 4.3.8 continued to run and completed.  

[1] https://docs.openshift.com/container-platform/4.3/installing/installing_vsphere/installing-vsphere.html#minimum-resource-requirements_installing-vsphere
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1812719