2072821 – Top Consumers of Storage Traffic in Kubevirt Dashboard giving unexpected numbers

Bug 2072821 - Top Consumers of Storage Traffic in Kubevirt Dashboard giving unexpected numbers

Summary: Top Consumers of Storage Traffic in Kubevirt Dashboard giving unexpected numbers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Metrics
Sub Component:
Version:	4.10.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Assaf Admi
QA Contact:	Ohad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-07 03:05 UTC by Germano Veit Michel
Modified:	2023-01-24 13:37 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.12.0-568
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-24 13:36:09 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-17463	0	None	None	None	2022-11-22 19:42:16 UTC
Red Hat Product Errata	RHSA-2023:0408	0	None	None	None	2023-01-24 13:37:30 UTC

Description Germano Veit Michel 2022-04-07 03:05:33 UTC

Description of problem:

This is in Observe -> Dashboards -> Kubevirt -> Top Consumers of Storage Traffic

Version-Release number of selected component (if applicable):
4.10.6 + CNV 4.10.0

How reproducible:
Always

Steps to Reproduce:
1. Create a few VMs, make them boot
2. Inside the VM, generate some traffic
   # dd if=/dev/urandom of=/home/testfile bs=1M count=3000 oflag=direct status=progress
2. Go to Observe -> Dashboards -> Kubevirt -> Top Consumers of Storage Traffic and monitor

Actual results:
* Numbers seem off, much lower than expected.
* Some VMs seem to be missing

Expected results:
* Actual VM storage IO

Comment 2 Krzysztof Majcher 2022-04-14 08:28:12 UTC

Assaf, please assess if this can be easily addressed.

Comment 3 Assaf Admi 2022-07-28 10:13:31 UTC

Hi,

Top Consumers of Storage Traffic in Kubevirt Dashboard uses 4h as time range in its query:

sort_desc(topk(5, sum(rate(kubevirt_vmi_storage_read_traffic_bytes_total[4h]) + rate(kubevirt_vmi_storage_write_traffic_bytes_total[4h])) by (namespace, name)))>0

While the dashboard on the bottom of virtualiztion/overview uses 5m as time range in its query:

sort_desc(topk(5, sum(rate(kubevirt_vmi_storage_read_traffic_bytes_total[5m]) + rate(kubevirt_vmi_storage_write_traffic_bytes_total[5m])) by (namespace, name))) > 0


I believe this is the reason that after generating some traffic, the numbers on Top Consumers of Storage Traffic were low. Its query does rate over a much longer time range. When I modified its time range to 5m, I saw identical values.

Comment 4 Germano Veit Michel 2022-08-02 00:27:46 UTC

Hi Assaf,

Same here, if I set to 5m it does look correct. Should it just be changed to 5m then? Averaging over 4h does not make much sense if the X axis is not in 4h increments by default.

Comment 5 Shirly Radco 2022-08-02 13:21:10 UTC

The idea with 4 hours in the table panels dashboard is to get the top 5 VMs that consume the most storage resources during this time period.
In the UI it is possible to change this by choosing a different "Period" from the drop down list.
For the line charts we do use 5m. 
We can set the default period to 5m instead of 4h, but when I discussed this with Ronen Sde-Or we thought it would make more sense to check for a longer time.

Comment 6 Germano Veit Michel 2022-08-02 21:14:51 UTC

Right, I see the point of using an average of 4h for that purpose.
However, the way the information is presented is not clear.

Perhaps if the Dashboard renamed the columns based on the period selected it would be clearer?

For example, perhaps something like this?

Top Consumers of Storage Traffic

Namespace         Virtual Machine       Average Storage Traffic Usage Over {period}
openshift-cnv     rhel8                 7.14 KiB

It's hard for the user to know what is being averaged and what is instant without inspecting the query.

Comment 7 Ronen 2022-08-04 14:22:42 UTC

I agree with Germano, we need to make sure the user understand the information.

Comment 8 Shirly Radco 2022-08-04 16:01:53 UTC

Unfortunately there is no support for dynamic headers in the OCP UI.
I did ask to add support for panel description, but I don't see that it was implemented yet.

Should we change the default to 5m like the line charts so that they are aligned for now?

Comment 9 Ronen 2022-08-04 16:08:03 UTC

Shirly, Is there a limit on the timeframe we can show if we'll use 5m instead of 4h?
If there is no limit so let's modify it to 5m so it will align.

Comment 10 Shirly Radco 2022-08-16 15:24:59 UTC

We should align the default period to be 5m an add to the tables a suffix that explains that the data is calculated based on the selected period.

Comment 11 Assaf Admi 2022-09-28 08:29:16 UTC

It was decided to align both Top-Consumers dashboard and Virtualization/Overview dashboard with a 30 minutes time range, and it was implemented in the following PRs:
https://github.com/kubevirt-ui/kubevirt-plugin/pull/885
https://github.com/kubevirt/monitoring/pull/94

Comment 12 SATHEESARAN 2022-10-18 08:07:39 UTC

Tested with CNV-4.11.1-20, dashboard under 'Observe' -> Dashboards -> Kubevirt -> Top Consumers of Storage Traffic,
still shows the 'period' dropdown.

Then confirmed with Assaf & Oren, that the fix is not yet available in CNV 4.11.1
Moving this bug back to ASSIGNED state

Comment 13 SATHEESARAN 2022-10-18 09:51:50 UTC

The fix was not available with 4.11.1-20 and so clearing the FIXED-IN-VERSION field.
The bug is now retargeted for 4.12 as the original fix is available in upstream master.

Comment 15 Ohad 2022-12-01 15:39:42 UTC

Tried on CNV-v4.12.0-745 and still get the same bug.

Comment 16 Assaf Admi 2022-12-07 10:35:15 UTC

I tried on CNV 4.12.0, and it seems to me that the bug was fixed. 
Virtualization -> Overview -> Top consumers -> Storage throughput shows a very similar numbers compering to KubeVirt / Infrastructure Resources / Top Consumers dashboard (Top Consumers of Storage Traffic graph). 
In addition, I could see in the cluster the changes that were done in the PRs that fixed the issue:
- I was able to see what https://github.com/kubevirt/monitoring/pull/94 changed in KubeVirt / Infrastructure Resources / Top Consumers dashboard
- I was able to see what https://github.com/kubevirt-ui/kubevirt-plugin/pull/885 changed in Virtualization -> Overview -> Top consumers. 

Ohad, can you please share your steps you did for verifying this bug?

Comment 17 Krzysztof Majcher 2022-12-12 15:10:17 UTC

moving to 4.12.1 as I doubt this is blocker bug fo 4.12.0

Comment 18 Ohad 2022-12-13 11:40:15 UTC

Tested now on CNV 4.12, the bug fixed

Comment 22 errata-xmlrpc 2023-01-24 13:36:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0408

Note You need to log in before you can comment on or make changes to this bug.