1867477 – HPA monitoring cpu utilization fails for deployments which have init containers

Bug 1867477 - HPA monitoring cpu utilization fails for deployments which have init containers

Summary: HPA monitoring cpu utilization fails for deployments which have init containers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Joel Smith
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1749468 (view as bug list)
Depends On:
Blocks:	1895532
TreeView+	depends on / blocked

Reported:	2020-08-10 06:38 UTC by Arnab Ghosh
Modified:	2024-06-13 22:55 UTC (History)
CC List:	33 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: HPA ignores pods with incomplete metrics like those sent by the prometheus adaptor in the case of pods with init containers. Consequence: Any pod with an init container would not be scaled. Fix: Make prometheus adaptor send complete metrics for init containers. Result: HPA can scale pods with init containers.
Clone Of:
Environment:
Last Closed:	2021-02-24 15:15:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift k8s-prometheus-adapter pull 33	None	closed	Bug 1867477: Populate both CPU and Memory resource container metrics if one is specified	2021-02-18 01:36:34 UTC
Red Hat Knowledge Base (Solution)	5972361	None	None	None	2021-04-16 14:30:58 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:15:56 UTC

Description Arnab Ghosh 2020-08-10 06:38:51 UTC

Description of problem:
This bug could be a duplicate of bug[1]. Creating this as the issue seems to be persisting even after upgrading the cluster to 4.5.4. The errata for bug[1] says that it has been fixed in Openshift version 4.5.1.

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1749468

~~~
$ oc get clusterversion -oyaml
...
    - lastTransitionTime: "2020-04-15T19:32:18Z"
      message: Done applying 4.5.4
      status: "True"
      type: Available

$ oc describe hpa mongo-ss
Name:                                                  mongo-ss
Namespace:                                             default
...
Reference:                                             StatefulSet/mongo-ss
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 12%
Min replicas:                                          1
Max replicas:                                          10
StatefulSet pods:                                      1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: did not receive metrics for any ready pods
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedGetResourceMetric       4m39s (x3 over 5m9s)   horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  4m39s (x3 over 5m9s)   horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  2m24s (x9 over 4m24s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
  Warning  FailedGetResourceMetric       9s (x18 over 4m24s)    horizontal-pod-autoscaler  did not receive metrics for any ready pods
~~~

Version-Release number of selected component (if applicable):
OpenShift 4.5.4

How reproducible:
Always

Steps to Reproduce:
Reproducible steps in bug[1] was followed.

Actual results:
HPA is not showing proper status.

Expected results:
HPA should be able to handle init containers.

Additional info:
Refer to comment section of this bug.

Comment 4 Neelesh Agrawal 2020-09-09 14:07:04 UTC

*** Bug 1749468 has been marked as a duplicate of this bug. ***

Comment 29 Oscar Casal Sanchez 2020-11-09 07:37:08 UTC

Hello!

I was reviewing the bug linked to this Bugzilla and I was able to find for 4.6 and 4.4 target releases, but not for 4.5. Are you aware if does it exist already?

Regards,
Oscar

Comment 31 Weinan Liu 2020-11-11 10:26:54 UTC

Failed Test

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-11-033756   True        False         3h7m    Cluster version is 4.7.0-0.nightly-2020-11-11-033756

Got the same outpust as above

Comment 34 Weinan Liu 2020-11-12 06:53:57 UTC

Thanks, @Joel,
I though the warnings should also get cleared.

As per comment #33 and #30, issue got fixed on  $ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-11-033756   True        False         3h7m    Cluster version is 4.7.0-0.nightly-2020-11-11-033756

Comment 37 Joel Smith 2020-11-13 17:01:15 UTC

Zero CPU usage for the init container is the fix we added. It makes it so that HPA will not consider the metrics invalid. 

If the metrics report a container with a memory metric, but no CPU metric then HPA will think that something is wrong with the metrics and it won't scale. That's what caused this bug. So the metrics either have to completely remove the init container, or include it with zero values for both CPU and memory. We decided that the cleanest fix was to include it with the zero values.

If you see an init container metric like this:

        {
          "name": "empty-init",
          "usage": {
            "cpu": "0",
            "memory": "0"
          }
        },

that is good, and expected. Because the init container finishes running before the main container starts, we would expect its CPU usage to stay at zero for the rest of the pod's lifetime.

If you see an init container metric like this:

        {
          "name": "empty-init",
          "usage": {
            "memory": "0"
          }
        },

then HPA will fail due to the missing CPU metrics.

Comment 38 Weinan Liu 2020-11-18 09:08:10 UTC

@oarribas,

Do we have any other items to check on veryfing this issue?

Comment 44 Weinan Liu 2020-11-30 16:08:24 UTC

https://github.com/openshift/cucushift/pull/8246  	qe_test_coverage+

Comment 51 Amer EZAHIR 2021-02-16 09:33:20 UTC

Hi,

I'm on OCP4.5.19 and I'm facing the same issue,
is this has been resolved definitly on OCP4.6 or is there any workaround to work on this for the ocp4.5.19 please ?

Thanks
Kind regards

Comment 52 Tom Sweeney 2021-02-16 15:23:26 UTC

Joel do you have an answer to Amer's question in comment #51?

Comment 55 errata-xmlrpc 2021-02-24 15:15:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.

adeshpan
akaris
amer.ezahir
andbartl
aos-bugs
christopher.obrien
ddelcian
fshaikh
jean.froment
jeder
jlee
joelsmith
john.macleod
jokerman
jseunghw
kperrier
ksathe
mfiedler
nmaynard
oarribas
ocasalsa
openshift-bugs-escalate
pbergene
pkanthal
rpalathi
sgarciam
skrenger
tkonishi
tmckay
tsweeney
vjaypurk
weinliu
xingli