Bug 1636453

Summary: unable to get metrics for resource cpu: no metrics returned from heapster
Product: OpenShift Container Platform Reporter: Paul Yates <pyates>
Component: MonitoringAssignee: Ruben Vargas Palma <rvargasp>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, ddelcian, minden, rvargasp
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-20 17:24:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Paul Yates 2018-10-05 12:42:10 UTC
Description of problem:

A number of customers are concerned with a HorizontalPodAutoscaler (HPA) issue.

When a customer makes use of HPA, they are being bombarded with Events (e.g.: one customer ~9000 events in last 3 days) which report: 

'unable to get metrics for resource cpu: no metrics returned from heapster'

As far as the customer is concerned, HPA is NOT functioning correctly, but there is no issue with their HPA setup at all.  

The Events they are seeing are for pods that are newly created, pods that are terminating, or dead pods. Once a pod is fully functional, there are no further events for this pod, but as pods are scaled up/down frequently with the nature of HPA - there are many Events and log msgs being created, which is becoming increasingly annoying for customers.

The events they are seeing from the UI don't indicate which pod the error message is referring to also, which make it harder for them to understand what's happening.  


Version-Release number of selected component (if applicable):

oc v3.9.41
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.9.41
kubernetes v1.9.1+a0ce1bc657


How reproducible:
always

Steps to Reproduce:

1. Configure HPA: https://docs.openshift.com/container-platform/3.9/dev_guide/pod_autoscaling.html
2. Scale Pods using HPA
3. View events in the project and view Heapster logs.

Actual results:
There are many events and log entries for new pods, terminating pods, failed pods:
unable to get metrics for resource cpu: no metrics returned from heapster

Expected results:
In HPA setup, we should return metrics and create events for healthy pods and not repeatedly creating events for pods that are newly created or terminating.

Additional info: