Bug 1895532

Summary: HPA monitoring cpu utilization fails for deployments which have init containers
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Joel Smith <joelsmith>
Node sub component: Autoscaler (HPA, VPA) QA Contact: Weinan Liu <weinliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: akaris, andbartl, aos-bugs, christopher.obrien, ddelcian, fshaikh, jeder, jlee, joelsmith, jokerman, ksathe, mfiedler, nagrawal, nmaynard, oarribas, ocasalsa, openshift-bugs-escalate, pbergene, pkanthal, rpalathi, skrenger, tmckay, tsweeney, weinliu
Version: 4.5Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-30 16:45:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1867477    
Bug Blocks: 1897313    

Comment 5 Weinan Liu 2020-11-18 14:49:21 UTC
All look good execept for the warning (the same with on 4.7)

  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Warning  FailedComputeMetricsReplicas  18m (x12 over 20m)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
  Warning  FailedGetResourceMetric       17m (x13 over 20m)  horizontal-pod-autoscaler  did not receive metrics for any ready pods

Since we provide              "cpu": "0",

While do we still have the warning?

Comment 6 Joel Smith 2020-11-18 15:11:04 UTC
The warning is harmless as long as the age is about the same as the age of the pods.  What happens is that if the pods are created after the HPA or at about the same time, the HPA emits the warning for the first few minutes until the pod metrics are available.  There is always a delay after creating a pod before its metrics become available.

If you create the pods, then wait 5 minutes, then create the HPA, I think you won't see these warnings. That's because the HPA will be able to get metrics from the very beginning.

Comment 7 Weinan Liu 2020-11-18 16:27:16 UTC
Verifed as per comment #5 and #6

Comment 9 errata-xmlrpc 2020-11-30 16:45:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.