Bug 1495733

Summary:

Wrong units of net_usage_rate_average in containers metrics

Product:

Red Hat CloudForms Management Engine

Reporter:

Yaacov Zamir <yzamir>

Component:

C&U Capacity and Utilization

Assignee:

Yaacov Zamir <yzamir>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Tony Khamis <tkhamis>

Severity:

high

Docs Contact:

Priority:

high

Version:

unspecified

CC:

agrare, fsimonce, gshefer, jhardy, obarenbo, simaishi, yzamir

Target Milestone:

Keywords:

TestOnly, ZStream

Target Release:

5.9.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

container

Fixed In Version:

5.9.0.1

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1511142 (view as bug list)

Environment:

Last Closed:

2018-03-06 15:39:12 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

Container Management

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1511142

Attachments:

Description	Flags
Caps	none

Description Yaacov Zamir 2017-09-26 11:47:30 UTC

Description of problem:

We collect average kb usage per second from the external metrics data base.
When we insert data in the internal metrics data base we declare it to be percent and average datagrams per second.
 
Version-Release number of selected component (if applicable):


How reproducible:
All metrics collected from Hawkular or Prometheus has this issue

Steps to Reproduce:
1. collect containers metrics from Hawkular or Prometheus.

Actual results:
value * 100 (because we think it's percent) with datagram units. 

Expected results:
actual value with kb units

Additional info:

Comment 2 Dave Johnson 2017-09-26 12:04:09 UTC

Please assess the impact of this issue and update the severity accordingly.  Please refer to https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity for a reminder on each severity's definition.

If it's something like a tracker bug where it doesn't matter, please set it to Low/Low.

Comment 3 Yaacov Zamir 2017-09-26 12:38:10 UTC

submitted upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 4 Yaacov Zamir 2017-09-26 12:49:55 UTC

Edit:

Talking to Adam Grare and Ladislav Smola, the bug is not *100 but *2

So:

Actual results:
value * 2 (because we think it's 0.5 of value) with datagram units.

Comment 5 Gilad Shefer 2017-09-26 14:07:40 UTC

I'm going to check if this bug effects MetricRollups table.
Note: if this effects the rollup table (which means it effect on chargeback) we should raise the priority/severity.

Comment 6 Yaacov Zamir 2017-09-28 08:49:32 UTC

merged upstream:
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/118

Comment 7 Gilad Shefer 2017-10-02 12:02:34 UTC

So after investigation of the issue it seems like we indeed have issue.
Conclusions:

1. In openshift net consumption of rx=0.7; tx=0.55; (KiBps). (screenshot=s1.png)
2. In CFME net consumption (both r&t) is 3.818 KBps.(screenshot=s2.png)

Note that metric rollups are also effected by this, i.e. net_usage_rate_average=3.81813151041667

It seems like we indeed have issue.

Comment 8 Gilad Shefer 2017-10-02 12:03:15 UTC

Created attachment 1333209 [details]
Caps

Comment 9 Federico Simoncelli 2017-10-13 13:23:04 UTC

IIUC what we report for net_usage_rate_average wrong (of a *2 factor).
If so the BZ is high severity/priority.

Yaacov is 5.8 affected as well?
(To be on the safe side and not to risk to miss this I'll mark for 5.8)

Comment 10 Yaacov Zamir 2017-10-15 08:09:54 UTC

> Yaacov is 5.8 affected as well?
> (To be on the safe side and not to risk to miss this I'll mark for 5.8)

YES, the offending code exist in 5.8 [1]

[1]
Entered in #5334 Nov 6, 2015
https://github.com/ManageIQ/manageiq/pull/5334/files