Description of problem: This is in Observe -> Dashboards -> Kubevirt -> Top Consumers of Storage Traffic Version-Release number of selected component (if applicable): 4.10.6 + CNV 4.10.0 How reproducible: Always Steps to Reproduce: 1. Create a few VMs, make them boot 2. Inside the VM, generate some traffic # dd if=/dev/urandom of=/home/testfile bs=1M count=3000 oflag=direct status=progress 2. Go to Observe -> Dashboards -> Kubevirt -> Top Consumers of Storage Traffic and monitor Actual results: * Numbers seem off, much lower than expected. * Some VMs seem to be missing Expected results: * Actual VM storage IO
Assaf, please assess if this can be easily addressed.
Hi, Top Consumers of Storage Traffic in Kubevirt Dashboard uses 4h as time range in its query: sort_desc(topk(5, sum(rate(kubevirt_vmi_storage_read_traffic_bytes_total[4h]) + rate(kubevirt_vmi_storage_write_traffic_bytes_total[4h])) by (namespace, name)))>0 While the dashboard on the bottom of virtualiztion/overview uses 5m as time range in its query: sort_desc(topk(5, sum(rate(kubevirt_vmi_storage_read_traffic_bytes_total[5m]) + rate(kubevirt_vmi_storage_write_traffic_bytes_total[5m])) by (namespace, name))) > 0 I believe this is the reason that after generating some traffic, the numbers on Top Consumers of Storage Traffic were low. Its query does rate over a much longer time range. When I modified its time range to 5m, I saw identical values.
Hi Assaf, Same here, if I set to 5m it does look correct. Should it just be changed to 5m then? Averaging over 4h does not make much sense if the X axis is not in 4h increments by default.
The idea with 4 hours in the table panels dashboard is to get the top 5 VMs that consume the most storage resources during this time period. In the UI it is possible to change this by choosing a different "Period" from the drop down list. For the line charts we do use 5m. We can set the default period to 5m instead of 4h, but when I discussed this with Ronen Sde-Or we thought it would make more sense to check for a longer time.
Right, I see the point of using an average of 4h for that purpose. However, the way the information is presented is not clear. Perhaps if the Dashboard renamed the columns based on the period selected it would be clearer? For example, perhaps something like this? Top Consumers of Storage Traffic Namespace Virtual Machine Average Storage Traffic Usage Over {period} openshift-cnv rhel8 7.14 KiB It's hard for the user to know what is being averaged and what is instant without inspecting the query.
I agree with Germano, we need to make sure the user understand the information.
Unfortunately there is no support for dynamic headers in the OCP UI. I did ask to add support for panel description, but I don't see that it was implemented yet. Should we change the default to 5m like the line charts so that they are aligned for now?
Shirly, Is there a limit on the timeframe we can show if we'll use 5m instead of 4h? If there is no limit so let's modify it to 5m so it will align.
We should align the default period to be 5m an add to the tables a suffix that explains that the data is calculated based on the selected period.
It was decided to align both Top-Consumers dashboard and Virtualization/Overview dashboard with a 30 minutes time range, and it was implemented in the following PRs: https://github.com/kubevirt-ui/kubevirt-plugin/pull/885 https://github.com/kubevirt/monitoring/pull/94
Tested with CNV-4.11.1-20, dashboard under 'Observe' -> Dashboards -> Kubevirt -> Top Consumers of Storage Traffic, still shows the 'period' dropdown. Then confirmed with Assaf & Oren, that the fix is not yet available in CNV 4.11.1 Moving this bug back to ASSIGNED state
The fix was not available with 4.11.1-20 and so clearing the FIXED-IN-VERSION field. The bug is now retargeted for 4.12 as the original fix is available in upstream master.
Tried on CNV-v4.12.0-745 and still get the same bug.
I tried on CNV 4.12.0, and it seems to me that the bug was fixed. Virtualization -> Overview -> Top consumers -> Storage throughput shows a very similar numbers compering to KubeVirt / Infrastructure Resources / Top Consumers dashboard (Top Consumers of Storage Traffic graph). In addition, I could see in the cluster the changes that were done in the PRs that fixed the issue: - I was able to see what https://github.com/kubevirt/monitoring/pull/94 changed in KubeVirt / Infrastructure Resources / Top Consumers dashboard - I was able to see what https://github.com/kubevirt-ui/kubevirt-plugin/pull/885 changed in Virtualization -> Overview -> Top consumers. Ohad, can you please share your steps you did for verifying this bug?
moving to 4.12.1 as I doubt this is blocker bug fo 4.12.0
Tested now on CNV 4.12, the bug fixed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.12.0 Images security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:0408