1495733 – Wrong units of net_usage_rate_average in containers metrics

Bug 1495733 - Wrong units of net_usage_rate_average in containers metrics

Summary: Wrong units of net_usage_rate_average in containers metrics

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	C&U Capacity and Utilization
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.9.0
Assignee:	Yaacov Zamir
QA Contact:	Tony Khamis
Docs Contact:
URL:
Whiteboard:	container
Depends On:
Blocks:	1511142
TreeView+	depends on / blocked

Reported:	2017-09-26 11:47 UTC by Yaacov Zamir
Modified:	2018-06-17 11:01 UTC (History)
CC List:	7 users (show)
Fixed In Version:	5.9.0.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1511142 (view as bug list)
Environment:
Last Closed:	2018-03-06 15:39:12 UTC
Category:	---
Cloudforms Team:	Container Management
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Caps (51.50 KB, application/x-tar) 2017-10-02 12:03 UTC, Gilad Shefer	no flags	Details
View All

Description Yaacov Zamir 2017-09-26 11:47:30 UTC

Description of problem:

We collect average kb usage per second from the external metrics data base.
When we insert data in the internal metrics data base we declare it to be percent and average datagrams per second.
 
Version-Release number of selected component (if applicable):


How reproducible:
All metrics collected from Hawkular or Prometheus has this issue

Steps to Reproduce:
1. collect containers metrics from Hawkular or Prometheus.

Actual results:
value * 100 (because we think it's percent) with datagram units. 

Expected results:
actual value with kb units

Additional info:

Comment 2 Dave Johnson 2017-09-26 12:04:09 UTC

Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 3 Yaacov Zamir 2017-09-26 12:38:10 UTC

submitted upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 4 Yaacov Zamir 2017-09-26 12:49:55 UTC

Edit:

Talking to Adam Grare and Ladislav Smola, the bug is not *100 but *2

So:

Actual results:
value * 2 (because we think it's 0.5 of value) with datagram units.

Comment 5 Gilad Shefer 2017-09-26 14:07:40 UTC

I'm going to check if this bug effects MetricRollups table.
Note: if this effects the rollup table (which means it effect on chargeback) we should raise the priority/severity.

Comment 6 Yaacov Zamir 2017-09-28 08:49:32 UTC

merged upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 7 Gilad Shefer 2017-10-02 12:02:34 UTC

So after investigation of the issue it seems like we indeed have issue.
Conclusions:

1. In openshift net consumption of rx=0.7; tx=0.55; (KiBps). (screenshot=s1.png)
2. In CFME net consumption (both r&t) is 3.818 KBps.(screenshot=s2.png)

Note that metric rollups are also effected by this, i.e. net_usage_rate_average=3.81813151041667

It seems like we indeed have issue.

Comment 8 Gilad Shefer 2017-10-02 12:03:15 UTC

Created attachment 1333209 [details]
Caps

Comment 9 Federico Simoncelli 2017-10-13 13:23:04 UTC

IIUC what we report for net_usage_rate_average wrong (of a *2 factor).
If so the BZ is high severity/priority.

Yaacov is 5.8 affected as well?
(To be on the safe side and not to risk to miss this I'll mark for 5.8)

Comment 10 Yaacov Zamir 2017-10-15 08:09:54 UTC

> Yaacov is 5.8 affected as well?
> (To be on the safe side and not to risk to miss this I'll mark for 5.8)

YES, the offending code exist in 5.8 [1]

[1]
Entered in #5334 Nov 6, 2015
https://github.com/ManageIQ/manageiq/pull/5334/files

Note You need to log in before you can comment on or make changes to this bug.