oc get clusterversion --context build01 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2020-03-04-222846 True False 7d5h Cluster version is 4.3.0-0.nightly-2020-03-04-222846 Created attachment 1669779 [details] alert.KubeCPUOvercommit AlertManager is set up on a CI build form cluster (OCP4.3) to send out notifications to slack. Recently there are alerts fired and I am not sure how I should debug/fix it. [FIRING:1] KubeCPUOvercommit (openshift-monitoring/k8s warning) Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure. https://coreos.slack.com/archives/CV1UZU53R/p1584031232061900 Are we supposed to silence it when autoscaler works properly (since autoscaler should add more node to cluster when the cluster is short of CPUs)? Or how should the debugging procedure be? Another (not very related) issue: There are 2 alerts with the same name "KubeCPUOvercommit". See the snapshort. Is it intended?
Related/Separated to/from https://bugzilla.redhat.com/show_bug.cgi?id=1813069
There is some work going on to improve the resource settings for monitoring components which may help with this. https://bugzilla.redhat.com/show_bug.cgi?id=1812719
*** This bug has been marked as a duplicate of bug 1812999 ***