Description of problem: This bug could be a duplicate of bug[1]. Creating this as the issue seems to be persisting even after upgrading the cluster to 4.5.4. The errata for bug[1] says that it has been fixed in Openshift version 4.5.1. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1749468 ~~~ $ oc get clusterversion -oyaml ... - lastTransitionTime: "2020-04-15T19:32:18Z" message: Done applying 4.5.4 status: "True" type: Available $ oc describe hpa mongo-ss Name: mongo-ss Namespace: default ... Reference: StatefulSet/mongo-ss Metrics: ( current / target ) resource cpu on pods (as a percentage of request): <unknown> / 12% Min replicas: 1 Max replicas: 10 StatefulSet pods: 1 current / 0 desired Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: did not receive metrics for any ready pods Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedGetResourceMetric 4m39s (x3 over 5m9s) horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API Warning FailedComputeMetricsReplicas 4m39s (x3 over 5m9s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API Warning FailedComputeMetricsReplicas 2m24s (x9 over 4m24s) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods Warning FailedGetResourceMetric 9s (x18 over 4m24s) horizontal-pod-autoscaler did not receive metrics for any ready pods ~~~ Version-Release number of selected component (if applicable): OpenShift 4.5.4 How reproducible: Always Steps to Reproduce: Reproducible steps in bug[1] was followed. Actual results: HPA is not showing proper status. Expected results: HPA should be able to handle init containers. Additional info: Refer to comment section of this bug.
*** Bug 1749468 has been marked as a duplicate of this bug. ***
Hello! I was reviewing the bug linked to this Bugzilla and I was able to find for 4.6 and 4.4 target releases, but not for 4.5. Are you aware if does it exist already? Regards, Oscar
Failed Test $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-11-033756 True False 3h7m Cluster version is 4.7.0-0.nightly-2020-11-11-033756 Got the same outpust as above
Thanks, @Joel, I though the warnings should also get cleared. As per comment #33 and #30, issue got fixed on $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-11-033756 True False 3h7m Cluster version is 4.7.0-0.nightly-2020-11-11-033756
Zero CPU usage for the init container is the fix we added. It makes it so that HPA will not consider the metrics invalid. If the metrics report a container with a memory metric, but no CPU metric then HPA will think that something is wrong with the metrics and it won't scale. That's what caused this bug. So the metrics either have to completely remove the init container, or include it with zero values for both CPU and memory. We decided that the cleanest fix was to include it with the zero values. If you see an init container metric like this: { "name": "empty-init", "usage": { "cpu": "0", "memory": "0" } }, that is good, and expected. Because the init container finishes running before the main container starts, we would expect its CPU usage to stay at zero for the rest of the pod's lifetime. If you see an init container metric like this: { "name": "empty-init", "usage": { "memory": "0" } }, then HPA will fail due to the missing CPU metrics.
@oarribas, Do we have any other items to check on veryfing this issue?
https://github.com/openshift/cucushift/pull/8246 qe_test_coverage+
Hi, I'm on OCP4.5.19 and I'm facing the same issue, is this has been resolved definitly on OCP4.6 or is there any workaround to work on this for the ocp4.5.19 please ? Thanks Kind regards
Joel do you have an answer to Amer's question in comment #51?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633