Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2029144

Summary:	Unable to get metrics for resource cpu events reported after HPA creation
Product:	OpenShift Container Platform	Reporter:	Danny <dzaken>
Component:	Node	Assignee:	Joel Smith <joelsmith>
Node sub component:	Autoscaler (HPA, VPA)	QA Contact:	Weinan Liu <weinliu>
Status:	CLOSED DEFERRED	Docs Contact:
Severity:	low
Priority:	medium	CC:	aos-bugs, joelsmith, nagrawal
Version:	4.9
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-03-09 01:09:38 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1885524

Description Danny 2021-12-05 11:14:52 UTC

Description of problem:

After OCS installation, there are multiple Events of Warning type from
horizontalpodautoscaler/noobaa-endpoint complaining that openshift is
"unable to get metrics for resource cpu". The stream of such events stops about
15 minutes after OCS installation.

NooBaa endpoint deployment is controlled by a horizontal pod autoscaler, which is the originator of these events

this is the cause for https://bugzilla.redhat.com/show_bug.cgi?id=1885524

Version-Release number of selected component (if applicable):

- OCP 4.9.0-0.nightly-2021-11-24-090558
- OCS 4.9.0-249.ci


How reproducible:


Steps to Reproduce:
1.
2.
3.

Steps to Reproduce
==================

1. Install OCP/OCS cluster
2. Login to OCP Console and open Overview Cluster dashboard
   (Home -> Overview -> Cluster)
3. See "Recent events" list

Or you can also go to Events page or list events via command line client:
`oc get events -n openshift-storage`.

Actual results
==============

After OCS installation, I see warnings related to HPA noobaa-endpoint such as:

```
15m         Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           unable to get metrics for resource cpu: no metrics returned from resource metrics API
15m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics re
turned from resource metrics API
12m         Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           did not receive metrics for any ready pods
12m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
```

Expected results
================

Admin should not wait another 15 minutes after OCS Storage Cluster installation
for these events to stop.

There should be no such Warning events from
horizontalpodautoscaler/noobaa-endpoint right after OCS installation.

Additional info
===============

After about 15 minutes after OCS installation, the horizontalpodautoscaler
noobaa-endpoint seems to work fine (I don't claim it works as expected, rather
that it's not in an error state):

```

$ ./oc describe horizontalpodautoscaler/noobaa-endpoint  -n openshift-storage 
Name:                                                  noobaa-endpoint
Namespace:                                             openshift-storage
Labels:                                                app=noobaa
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 05 Oct 2020 19:58:22 +0200
Reference:                                             Deployment/noobaa-endpoint
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (2m) / 80%
Min replicas:                                          1
Max replicas:                                          2
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Warning  FailedGetResourceMetric       19m (x2 over 20m)   horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  19m (x2 over 20m)   horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  17m (x10 over 19m)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
  Warning  FailedGetResourceMetric       17m (x11 over 19m)  horizontal-pod-autoscaler  did not receive metrics for any ready pods
```

   Private
Extra private groups
Comment 1

Comment 5 Shiftzilla 2023-03-09 01:09:38 UTC

OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9040

Comment 6 Red Hat Bugzilla 2023-09-18 04:29:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days