Bug 1884175

Summary: Object gateway performance chart Latency should factor latency count
Product: OpenShift Container Platform Reporter: Anmol Sachan <asachan>
Component: Console Storage PluginAssignee: Bipul Adhikari <badhikar>
Status: CLOSED ERRATA QA Contact: Filip Balák <fbalak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, badhikar, nberry, nthomas
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:47:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Description Flags
Object Gateway latency comparison
Latency GET comparison
Latency PUT comparison none

Description Anmol Sachan 2020-10-01 09:26:59 UTC
Description of problem: Object gateway performance chart Latency should be fator the latency count

Version-Release number of selected component (if applicable): On Object service dashboard, Object gateway performance chart Latency should be fator the latency count

How reproducible: 100 %

Steps to Reproduce:

Actual results:

   Current Queries : latencyGet: 'avg(rate(ceph_rgw_get_initial_lat_sum[1m]))',
                     latencyPut: 'avg(rate(ceph_rgw_put_initial_lat_sum[1m]))',

Expected results:

   Expected Queries:   latencyGet: 'avg(rate(ceph_rgw_get_initial_lat_sum[1m])) /avg(rate(ceph_rgw_get_initial_lat_count[1m])>0)',
                       latencyPut: 'avg(rate(ceph_rgw_put_initial_lat_sum[1m])) /avg(rate(ceph_rgw_put_initial_lat_count[1m])>0)',

Additional info:

Comment 3 Filip Balák 2020-10-14 08:44:41 UTC
Created attachment 1721425 [details]
Object Gateway latency comparison

Current chart doesn't contain any information that the chart takes into account latency count. There is no tooltip or label that indicates it.

In 'Object Gateway latency comparison' screenshot is compared how the latency graph looks with metrics queries provided in BZ description. From the comparison it seems that query provided in Expected results is provided (units and spacing between PUT and GET look similar).

@Anmol do you think that this is enough to verify this? If not, can you please provide me reproducer steps to verify the fix (where should I look for the change)?

@Bipul Please fill 'Fixed in version' information.

Tested with:
OCP: 4.6.0-0.nightly-2020-10-13-064047
OCS: ocs-operator.v4.6.0-131.ci

Comment 4 Anmol Sachan 2020-10-14 11:31:04 UTC
@filip Its the queries that had to be changed and thats what this BZ addresses. The units and spacing will remain same. There will be no visual changes, but just the data points.

Verification step can be just comparing the actual graph on the dashboard and plotting the same queries in the Promethues UI. If the values and graph look same, its right.

Comment 6 Bipul Adhikari 2020-10-14 12:24:01 UTC
Any builds post `2020-10-06 18:31:33 UTC` should have the fix. Since you tested on `2020-10-13` it should hence contain the fix.

Comment 7 Filip Balák 2020-10-14 13:34:41 UTC
Created attachment 1721481 [details]
Latency GET comparison

Comment 8 Filip Balák 2020-10-14 13:35:29 UTC
Created attachment 1721482 [details]
Latency PUT comparison

Comment 9 Filip Balák 2020-10-14 13:38:34 UTC
After more testing it seems that queries provided in 'Expected Results' section in Description of this BZ are in place. -> VERIFIED

Tested with:
OCP: 4.6.0-0.nightly-2020-10-13-064047
OCS: ocs-operator.v4.6.0-131.ci

Comment 11 errata-xmlrpc 2020-10-27 16:47:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.