Bug 1867477
Summary: | HPA monitoring cpu utilization fails for deployments which have init containers | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Arnab Ghosh <arghosh> |
Component: | Node | Assignee: | Joel Smith <joelsmith> |
Node sub component: | Autoscaler (HPA, VPA) | QA Contact: | Weinan Liu <weinliu> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | adeshpan, akaris, amer.ezahir, andbartl, aos-bugs, christopher.obrien, ddelcian, fshaikh, jean.froment, jeder, jlee, joelsmith, john.macleod, jokerman, jseunghw, kperrier, ksathe, mfiedler, nmaynard, oarribas, ocasalsa, openshift-bugs-escalate, pbergene, pkanthal, rpalathi, sgarciam, skrenger, tkonishi, tmckay, tsweeney, vjaypurk, weinliu, xingli |
Version: | 4.5 | Keywords: | ServiceDeliveryImpact |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: HPA ignores pods with incomplete metrics like those sent by the prometheus adaptor in the case of pods with init containers.
Consequence: Any pod with an init container would not be scaled.
Fix: Make prometheus adaptor send complete metrics for init containers.
Result: HPA can scale pods with init containers.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:15:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1895532 |
Description
Arnab Ghosh
2020-08-10 06:38:51 UTC
*** Bug 1749468 has been marked as a duplicate of this bug. *** Hello! I was reviewing the bug linked to this Bugzilla and I was able to find for 4.6 and 4.4 target releases, but not for 4.5. Are you aware if does it exist already? Regards, Oscar Failed Test $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-11-033756 True False 3h7m Cluster version is 4.7.0-0.nightly-2020-11-11-033756 Got the same outpust as above Thanks, @Joel, I though the warnings should also get cleared. As per comment #33 and #30, issue got fixed on $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-11-033756 True False 3h7m Cluster version is 4.7.0-0.nightly-2020-11-11-033756 Zero CPU usage for the init container is the fix we added. It makes it so that HPA will not consider the metrics invalid. If the metrics report a container with a memory metric, but no CPU metric then HPA will think that something is wrong with the metrics and it won't scale. That's what caused this bug. So the metrics either have to completely remove the init container, or include it with zero values for both CPU and memory. We decided that the cleanest fix was to include it with the zero values. If you see an init container metric like this: { "name": "empty-init", "usage": { "cpu": "0", "memory": "0" } }, that is good, and expected. Because the init container finishes running before the main container starts, we would expect its CPU usage to stay at zero for the rest of the pod's lifetime. If you see an init container metric like this: { "name": "empty-init", "usage": { "memory": "0" } }, then HPA will fail due to the missing CPU metrics. @oarribas, Do we have any other items to check on veryfing this issue? https://github.com/openshift/cucushift/pull/8246 qe_test_coverage+ Hi, I'm on OCP4.5.19 and I'm facing the same issue, is this has been resolved definitly on OCP4.6 or is there any workaround to work on this for the ocp4.5.19 please ? Thanks Kind regards Joel do you have an answer to Amer's question in comment #51? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |