Bug 1495733 - Wrong units of net_usage_rate_average in containers metrics
Summary: Wrong units of net_usage_rate_average in containers metrics
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: C&U Capacity and Utilization
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.9.0
Assignee: Yaacov Zamir
QA Contact: Tony Khamis
URL:
Whiteboard: container
Depends On:
Blocks: 1511142
TreeView+ depends on / blocked
 
Reported: 2017-09-26 11:47 UTC by Yaacov Zamir
Modified: 2018-06-17 11:01 UTC (History)
7 users (show)

Fixed In Version: 5.9.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1511142 (view as bug list)
Environment:
Last Closed: 2018-03-06 15:39:12 UTC
Category: ---
Cloudforms Team: Container Management
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Caps (51.50 KB, application/x-tar)
2017-10-02 12:03 UTC, Gilad Shefer
no flags Details

Description Yaacov Zamir 2017-09-26 11:47:30 UTC
Description of problem:

We collect average kb usage per second from the external metrics data base.
When we insert data in the internal metrics data base we declare it to be percent and average datagrams per second.
 
Version-Release number of selected component (if applicable):


How reproducible:
All metrics collected from Hawkular or Prometheus has this issue

Steps to Reproduce:
1. collect containers metrics from Hawkular or Prometheus.

Actual results:
value * 100 (because we think it's percent) with datagram units. 

Expected results:
actual value with kb units

Additional info:

Comment 2 Dave Johnson 2017-09-26 12:04:09 UTC
Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 3 Yaacov Zamir 2017-09-26 12:38:10 UTC
submitted upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 4 Yaacov Zamir 2017-09-26 12:49:55 UTC
Edit:

Talking to Adam Grare and Ladislav Smola, the bug is not *100 but *2

So:

Actual results:
value * 2 (because we think it's 0.5 of value) with datagram units.

Comment 5 Gilad Shefer 2017-09-26 14:07:40 UTC
I'm going to check if this bug effects MetricRollups table.
Note: if this effects the rollup table (which means it effect on chargeback) we should raise the priority/severity.

Comment 6 Yaacov Zamir 2017-09-28 08:49:32 UTC
merged upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 7 Gilad Shefer 2017-10-02 12:02:34 UTC
So after investigation of the issue it seems like we indeed have issue.
Conclusions:

1. In openshift net consumption of rx=0.7; tx=0.55; (KiBps). (screenshot=s1.png)
2. In CFME net consumption (both r&t) is 3.818 KBps.(screenshot=s2.png)

Note that metric rollups are also effected by this, i.e. net_usage_rate_average=3.81813151041667

It seems like we indeed have issue.

Comment 8 Gilad Shefer 2017-10-02 12:03:15 UTC
Created attachment 1333209 [details]
Caps

Comment 9 Federico Simoncelli 2017-10-13 13:23:04 UTC
IIUC what we report for net_usage_rate_average wrong (of a *2 factor).
If so the BZ is high severity/priority.

Yaacov is 5.8 affected as well?
(To be on the safe side and not to risk to miss this I'll mark for 5.8)

Comment 10 Yaacov Zamir 2017-10-15 08:09:54 UTC
> Yaacov is 5.8 affected as well?
> (To be on the safe side and not to risk to miss this I'll mark for 5.8)

YES, the offending code exist in 5.8 [1]

[1]
Entered in #5334 Nov 6, 2015
https://github.com/ManageIQ/manageiq/pull/5334/files


Note You need to log in before you can comment on or make changes to this bug.