Bug 1414485

Summary:	CloudForms generating WARN messages from OpenShift metrics cpu average out of range
Product:	Red Hat CloudForms Management Engine	Reporter:	myoder
Component:	C&U Capacity and Utilization	Assignee:	Greg Blomquist <gblomqui>
Status:	CLOSED WONTFIX	QA Contact:	Einat Pacifici <epacific>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	5.6.0	CC:	agrare, epacific, fsimonce, jhardy, lavenel, lsmola, myoder, obarenbo, oourfali, yzamir
Target Milestone:	GA
Target Release:	cfme-future
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	container
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-18 11:15:43 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	Bug
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	Container Management	Target Upstream Version:
Embargoed:

Description myoder 2017-01-18 16:04:20 UTC

Description of problem: 

CloudForms is generating a lot of WARN messages in the logs based on metrics it is getting from OpenShift.  The cpu usage rate average ranges from the hundreds to the thousands.  The WARN message below is the largest value I saw in the logs.

[----] W, [2017-01-17T04:50:53.711909 #28092:3d3998]  WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::ContainerGroup#perf_process) [realtime] ManageIQ::Providers::Kubernetes::ContainerManager::C
ontainerGroup name: [logstash-5-ipi6a], id: [123000000001006] Timestamp: [2017-01-17T09:37:20Z], Column [cpu_usage_rate_average]: 'percent value 22624.395684399165 is out of range, resetting to 100.0'


Version-Release number of selected component (if applicable):
CFME 5.6.1.2

How reproducible:
Seems to be consistently generating these messages.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:  

Currently only seeing this WARN message with 5 Container Groups and 1 Container group.  The cu environment has 11 nodes.  I am working on verifying if these Container Groups and Containers are in the same node.

In the course of about 4 hours I'm seeing 2000 log lines being generated by this message.

Comment 3 Dave Johnson 2017-07-14 03:47:14 UTC

Please assess the importance of this issue and update the priority accordingly.  Somewhere it was missed in the bug triage process.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#priority for a reminder on each priority's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 7 Yaacov Zamir 2017-12-18 07:18:26 UTC

Hi,

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1524626

Sounds similar but not a duplicate, Do you have a way to check if the
patches that fix #1524626 also solve this issue ?

The patches are:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/187 - scrape every 60s.
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/159 - reflector scraping.