Bug 1495733

Summary: Wrong units of net_usage_rate_average in containers metrics
Product: Red Hat CloudForms Management Engine Reporter: Yaacov Zamir <yzamir>
Component: C&U Capacity and UtilizationAssignee: Yaacov Zamir <yzamir>
Status: CLOSED CURRENTRELEASE QA Contact: Tony Khamis <tkhamis>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: agrare, fsimonce, gshefer, jhardy, obarenbo, simaishi, yzamir
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: container
Fixed In Version: 5.9.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1511142 (view as bug list) Environment:
Last Closed: 2018-03-06 15:39:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1511142    
Attachments:
Description Flags
Caps none

Description Yaacov Zamir 2017-09-26 11:47:30 UTC
Description of problem:

We collect average kb usage per second from the external metrics data base.
When we insert data in the internal metrics data base we declare it to be percent and average datagrams per second.
 
Version-Release number of selected component (if applicable):


How reproducible:
All metrics collected from Hawkular or Prometheus has this issue

Steps to Reproduce:
1. collect containers metrics from Hawkular or Prometheus.

Actual results:
value * 100 (because we think it's percent) with datagram units. 

Expected results:
actual value with kb units

Additional info:

Comment 2 Dave Johnson 2017-09-26 12:04:09 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 3 Yaacov Zamir 2017-09-26 12:38:10 UTC
submitted upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 4 Yaacov Zamir 2017-09-26 12:49:55 UTC
Edit:

Talking to Adam Grare and Ladislav Smola, the bug is not *100 but *2

So:

Actual results:
value * 2 (because we think it's 0.5 of value) with datagram units.

Comment 5 Gilad Shefer 2017-09-26 14:07:40 UTC
I'm going to check if this bug effects MetricRollups table.
Note: if this effects the rollup table (which means it effect on chargeback) we should raise the priority/severity.

Comment 6 Yaacov Zamir 2017-09-28 08:49:32 UTC
merged upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 7 Gilad Shefer 2017-10-02 12:02:34 UTC
So after investigation of the issue it seems like we indeed have issue.
Conclusions:

1. In openshift net consumption of rx=0.7; tx=0.55; (KiBps). (screenshot=s1.png)
2. In CFME net consumption (both r&t) is 3.818 KBps.(screenshot=s2.png)

Note that metric rollups are also effected by this, i.e. net_usage_rate_average=3.81813151041667

It seems like we indeed have issue.

Comment 8 Gilad Shefer 2017-10-02 12:03:15 UTC
Created attachment 1333209 [details]
Caps

Comment 9 Federico Simoncelli 2017-10-13 13:23:04 UTC
IIUC what we report for net_usage_rate_average wrong (of a *2 factor).
If so the BZ is high severity/priority.

Yaacov is 5.8 affected as well?
(To be on the safe side and not to risk to miss this I'll mark for 5.8)

Comment 10 Yaacov Zamir 2017-10-15 08:09:54 UTC
> Yaacov is 5.8 affected as well?
> (To be on the safe side and not to risk to miss this I'll mark for 5.8)

YES, the offending code exist in 5.8 [1]

[1]
Entered in #5334 Nov 6, 2015
https://github.com/ManageIQ/manageiq/pull/5334/files