Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1802332

Summary: CI: occasional: 'failed to initialize the cluster: Cluster operator monitoring is still updating' with insufficient compute CPU
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED DUPLICATE QA Contact: Sunil Choudhary <schoudha>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, eparis, jokerman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-24 20:01:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Pod CPU request differences between 4.3 and 4.4 tips none

Description W. Trevor King 2020-02-12 22:44:48 UTC
In a handful of 4.4 CI runs today, for example [1], we see failures like:

  level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-wj40jzv7-52597.ci.azure.devcluster.openshift.com:6443 to initialize..."
  level=error msg="Cluster operator etcd Degraded is True with TargetConfigController_SynchronizationError: TargetConfigControllerDegraded: \"configmap/kube-apiserver-pod\": node/ci-op-wj40jzv7-52597-7qblc-master-2 missing InternalDNS"
  level=info msg="Cluster operator insights Disabled is False with : "
  level=info msg="Cluster operator monitoring Available is False with : "
  level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack."
  level=error msg="Cluster operator monitoring Degraded is True with UpdatingPrometheusK8SFailed: Failed to rollout the stack. Error: running task Updating Prometheus-k8s failed: waiting for Prometheus object changes failed: waiting for Prometheus: expected 2 replicas, updated 0 and available 0"
  level=fatal msg="failed to initialize the cluster: Cluster operator monitoring is still updating"

because of a compute CPU shortage [2]:

  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.

This CI run had two compute nodes [3], and both had 2 CPU capacity and 1232m allocatable [4,5].  From [2], total requests for this pod was 200m (prometheus) + 100m (prometheus-config-reloader) + 100m (rules-configmap-reloader) + 50m (thanos-sidecar) + 10m (prometheus-proxy) + 10m (kube-rbac-proxy) + 10m (prom-label-proxy) = 480m, which is less than the 1232m allocatable.  I'm not sure where to look in the logs to see if the scheduler has anything more specific to say about why it decided the prometheus pod wouldn't fit.

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420
[2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420/artifacts/e2e-azure/must-gather/registry-svc-ci-openshift-org-ci-op-wj40jzv7-stable-sha256-a273f5ac7f1ad8f7ffab45205ac36c8dff92d9107ef3ae429eeb135fa8057b8b/namespaces/openshift-monitoring/pods/prometheus-k8s-1/prometheus-k8s-1.yaml
[3]: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420/artifacts/e2e-azure/must-gather/registry-svc-ci-openshift-org-ci-op-wj40jzv7-stable-sha256-a273f5ac7f1ad8f7ffab45205ac36c8dff92d9107ef3ae429eeb135fa8057b8b/cluster-scoped-resources/core/nodes/
[4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420/artifacts/e2e-azure/must-gather/registry-svc-ci-openshift-org-ci-op-wj40jzv7-stable-sha256-a273f5ac7f1ad8f7ffab45205ac36c8dff92d9107ef3ae429eeb135fa8057b8b/cluster-scoped-resources/core/nodes/ci-op-wj40jzv7-52597-7qblc-worker-eastus22-nqdqk.yaml
[5]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420/artifacts/e2e-azure/must-gather/registry-svc-ci-openshift-org-ci-op-wj40jzv7-stable-sha256-a273f5ac7f1ad8f7ffab45205ac36c8dff92d9107ef3ae429eeb135fa8057b8b/cluster-scoped-resources/core/nodes/ci-op-wj40jzv7-52597-7qblc-worker-eastus23-2qhxq.yaml

Comment 1 W. Trevor King 2020-02-12 22:47:21 UTC
Created attachment 1662829 [details]
Pod CPU request differences between 4.3 and 4.4 tips

In case it turns out to be an over-request issue, I'm attaching the changes in pod CPU requests between recent 4.3 and 4.4 release images.  Generated with:

  $ diff -U0 <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.3/818/artifacts/e2e-aws/pods.json | jq -r '.items[] | .metadata.namespace as $namespace | .spec.containers[] | $namespace + " " + .name + " " + .resources.requests.cpu' | sort | uniq | awk 'BEGIN{a = ""}{if ($1 != a) {print ""}; a = $1; print $0}') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.4/1177/artifacts/e2e-aws/pods.json | jq -r '.items[] | .metadata.namespace as $namespace | .spec.containers[] | $namespace + " " + .name + " " + .resources.requests.cpu' | sort | uniq | awk 'BEGIN{a = ""}{if ($1 != a) {print ""}; a = $1; print $0}')

Comment 2 W. Trevor King 2020-02-12 22:57:06 UTC
Consumption on one of the compute nodes in the CI job I picked in comment 0:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420/artifacts/e2e-azure/pods.json | jq -r '.items[] | select(.spec.nodeName == "ci-op-wj40jzv7-52597-7qblc-worker-eastus22-nqdqk") | .metadata.namespace as $namespace | .metadata.name as $name | .status.phase as $phase | .spec.containers[] | $phase + " " + $namespace + " " + $name + " " + .name + " " + .resources.requests.cpu'
Running openshift-cluster-node-tuning-operator tuned-2pzdm tuned 10m
Running openshift-csi-snapshot-controller-operator csi-snapshot-controller-operator-5555c45855-hhfts operator 10m
Running openshift-dns dns-default-57kkz dns 100m
Running openshift-dns dns-default-57kkz dns-node-resolver 10m
Running openshift-image-registry image-registry-698d4b86f8-62pvp registry 100m
Running openshift-image-registry node-ca-qd2pl node-ca 10m
Running openshift-ingress router-default-5479579b9f-hwk9g router 100m
Running openshift-kube-storage-version-migrator migrator-78dcfcb958-x44ff migrator 100m
Running openshift-machine-config-operator machine-config-daemon-j4mm6 machine-config-daemon 20m
Running openshift-machine-config-operator machine-config-daemon-j4mm6 oauth-proxy 20m
Running openshift-marketplace certified-operators-7b56bc5f66-7gfjp certified-operators 10m
Running openshift-marketplace community-operators-58f88b46bd-xfv8p community-operators 10m
Running openshift-marketplace redhat-marketplace-6f68bc576f-t7hbf redhat-marketplace 10m
Running openshift-marketplace redhat-operators-58dcd77c84-9nnwx redhat-operators 10m
Running openshift-monitoring alertmanager-main-2 alertmanager 
Running openshift-monitoring alertmanager-main-2 config-reloader 100m
Running openshift-monitoring alertmanager-main-2 alertmanager-proxy 10m
Running openshift-monitoring kube-state-metrics-b57d78654-7zcz4 kube-rbac-proxy-main 10m
Running openshift-monitoring kube-state-metrics-b57d78654-7zcz4 kube-rbac-proxy-self 10m
Running openshift-monitoring kube-state-metrics-b57d78654-7zcz4 kube-state-metrics 10m
Running openshift-monitoring node-exporter-f52v8 node-exporter 102m
Running openshift-monitoring node-exporter-f52v8 kube-rbac-proxy 10m
Running openshift-monitoring openshift-state-metrics-68747b76d9-6cvhh kube-rbac-proxy-main 10m
Running openshift-monitoring openshift-state-metrics-68747b76d9-6cvhh kube-rbac-proxy-self 10m
Running openshift-monitoring openshift-state-metrics-68747b76d9-6cvhh openshift-state-metrics 100m
Running openshift-monitoring prometheus-adapter-5c6d59c87d-2d9rn prometheus-adapter 10m
Running openshift-monitoring prometheus-adapter-5c6d59c87d-lz7r5 prometheus-adapter 10m
Running openshift-monitoring telemeter-client-7f759bd4b-qgk4p telemeter-client 
Running openshift-monitoring telemeter-client-7f759bd4b-qgk4p reload 
Running openshift-monitoring telemeter-client-7f759bd4b-qgk4p kube-rbac-proxy 10m
Running openshift-multus multus-9rw56 kube-multus 10m
Running openshift-sdn ovs-9psjp openvswitch 200m
Running openshift-sdn sdn-dw4v9 sdn 100m


So that's 1232m.  Ah, is the node status.allocatable.cpu really "how much we've already handed out" instead of "how much we have left"?  But even if it is and we've handed out 1232m of the full 2 CPU, we should have 768m left which is plenty for the 480m prometheus pod that is failing to schedule.  I dunno; hopefully component team folks understand this better than me ;).

Comment 3 Eric Paris 2020-02-12 23:24:29 UTC
Allocatable == 1232 = Capacity(2) - System Reserved(768m)
https://github.com/openshift/machine-config-operator/blob/master/templates/worker/01-worker-kubelet/_base/files/kubelet.yaml#L25

So we need to calculate how much we handed out by summing up the pod requests manually (that's included in `oc describe node` but doesn't help here)

Comment 4 W. Trevor King 2020-02-12 23:26:12 UTC
> So we need to calculate how much we handed out by summing up the pod requests manually.

That's what I did in comment 2, and it shows us with 1232m requested running.  So we've neatly filled our capacity?  But MemoryPressure is still False [1]?

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/125/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/420/artifacts/e2e-azure/must-gather/registry-svc-ci-openshift-org-ci-op-wj40jzv7-stable-sha256-a273f5ac7f1ad8f7ffab45205ac36c8dff92d9107ef3ae429eeb135fa8057b8b/cluster-scoped-resources/core/nodes/ci-op-wj40jzv7-52597-7qblc-worker-eastus22-nqdqk.yaml

Comment 5 Ryan Phillips 2020-02-24 20:01:25 UTC
CPU reservations were bumped back down. Marking as a dup of 1803239

*** This bug has been marked as a duplicate of bug 1803239 ***