Bug 1414485

Summary: CloudForms generating WARN messages from OpenShift metrics cpu average out of range
Product: Red Hat CloudForms Management Engine Reporter: myoder
Component: C&U Capacity and UtilizationAssignee: Greg Blomquist <gblomqui>
Status: CLOSED WONTFIX QA Contact: Einat Pacifici <epacific>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.6.0CC: agrare, epacific, fsimonce, jhardy, lavenel, lsmola, myoder, obarenbo, oourfali, yzamir
Target Milestone: GA   
Target Release: cfme-future   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: container
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-18 11:15:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:

Description myoder 2017-01-18 16:04:20 UTC
Description of problem: 

CloudForms is generating a lot of WARN messages in the logs based on metrics it is getting from OpenShift.  The cpu usage rate average ranges from the hundreds to the thousands.  The WARN message below is the largest value I saw in the logs.

[----] W, [2017-01-17T04:50:53.711909 #28092:3d3998]  WARN -- : MIQ(ManageIQ::Providers::Kubernetes::ContainerManager::ContainerGroup#perf_process) [realtime] ManageIQ::Providers::Kubernetes::ContainerManager::C
ontainerGroup name: [logstash-5-ipi6a], id: [123000000001006] Timestamp: [2017-01-17T09:37:20Z], Column [cpu_usage_rate_average]: 'percent value 22624.395684399165 is out of range, resetting to 100.0'


Version-Release number of selected component (if applicable):
CFME 5.6.1.2

How reproducible:
Seems to be consistently generating these messages.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:  

Currently only seeing this WARN message with 5 Container Groups and 1 Container group.  The cu environment has 11 nodes.  I am working on verifying if these Container Groups and Containers are in the same node.

In the course of about 4 hours I'm seeing 2000 log lines being generated by this message.

Comment 3 Dave Johnson 2017-07-14 03:47:14 UTC
Please assess the importance of this issue and update the priority accordingly.  Somewhere it was missed in the bug triage process.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#priority for a reminder on each priority's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 7 Yaacov Zamir 2017-12-18 07:18:26 UTC
Hi,

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1524626

Sounds similar but not a duplicate, Do you have a way to check if the
patches that fix #1524626 also solve this issue ?

The patches are:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/187 - scrape every 60s.
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/159 - reflector scraping.