Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2029144

Summary: Unable to get metrics for resource cpu events reported after HPA creation
Product: OpenShift Container Platform Reporter: Danny <dzaken>
Component: NodeAssignee: Joel Smith <joelsmith>
Node sub component: Autoscaler (HPA, VPA) QA Contact: Weinan Liu <weinliu>
Status: CLOSED DEFERRED Docs Contact:
Severity: low    
Priority: medium CC: aos-bugs, joelsmith, nagrawal
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:09:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1885524    

Description Danny 2021-12-05 11:14:52 UTC
Description of problem:

After OCS installation, there are multiple Events of Warning type from
horizontalpodautoscaler/noobaa-endpoint complaining that openshift is
"unable to get metrics for resource cpu". The stream of such events stops about
15 minutes after OCS installation.

NooBaa endpoint deployment is controlled by a horizontal pod autoscaler, which is the originator of these events

this is the cause for https://bugzilla.redhat.com/show_bug.cgi?id=1885524

Version-Release number of selected component (if applicable):

- OCP 4.9.0-0.nightly-2021-11-24-090558
- OCS 4.9.0-249.ci


How reproducible:


Steps to Reproduce:
1.
2.
3.

Steps to Reproduce
==================

1. Install OCP/OCS cluster
2. Login to OCP Console and open Overview Cluster dashboard
   (Home -> Overview -> Cluster)
3. See "Recent events" list

Or you can also go to Events page or list events via command line client:
`oc get events -n openshift-storage`.

Actual results
==============

After OCS installation, I see warnings related to HPA noobaa-endpoint such as:

```
15m         Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           unable to get metrics for resource cpu: no metrics returned from resource metrics API
15m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics re
turned from resource metrics API
12m         Warning   FailedGetResourceMetric        horizontalpodautoscaler/noobaa-endpoint                                           did not receive metrics for any ready pods
12m         Warning   FailedComputeMetricsReplicas   horizontalpodautoscaler/noobaa-endpoint                                           invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
```

Expected results
================

Admin should not wait another 15 minutes after OCS Storage Cluster installation
for these events to stop.

There should be no such Warning events from
horizontalpodautoscaler/noobaa-endpoint right after OCS installation.

Additional info
===============

After about 15 minutes after OCS installation, the horizontalpodautoscaler
noobaa-endpoint seems to work fine (I don't claim it works as expected, rather
that it's not in an error state):

```

$ ./oc describe horizontalpodautoscaler/noobaa-endpoint  -n openshift-storage 
Name:                                                  noobaa-endpoint
Namespace:                                             openshift-storage
Labels:                                                app=noobaa
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 05 Oct 2020 19:58:22 +0200
Reference:                                             Deployment/noobaa-endpoint
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (2m) / 80%
Min replicas:                                          1
Max replicas:                                          2
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Warning  FailedGetResourceMetric       19m (x2 over 20m)   horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  19m (x2 over 20m)   horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  17m (x10 over 19m)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
  Warning  FailedGetResourceMetric       17m (x11 over 19m)  horizontal-pod-autoscaler  did not receive metrics for any ready pods
```

   Private
Extra private groups
Comment 1

Comment 5 Shiftzilla 2023-03-09 01:09:38 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9040

Comment 6 Red Hat Bugzilla 2023-09-18 04:29:02 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days