Bug 2150832 - vCPU number is not correct in Virtualization -> Overview
Summary: vCPU number is not correct in Virtualization -> Overview
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: User Experience
Version: 4.13.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.13.0
Assignee: Phillip Bailey
QA Contact: Guohua Ouyang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-05 11:50 UTC by Guohua Ouyang
Modified: 2023-05-18 02:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-18 02:56:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vCPU number is not correct (171.87 KB, image/png)
2022-12-05 11:50 UTC, Guohua Ouyang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt-ui kubevirt-plugin pull 1030 0 None open Bug 2150832: Fix vCPU query in cluster overview metrics card 2023-01-27 13:45:13 UTC
Red Hat Issue Tracker CNV-23124 0 None None None 2022-12-05 11:59:55 UTC
Red Hat Product Errata RHSA-2023:3205 0 None None None 2023-05-18 02:56:27 UTC

Description Guohua Ouyang 2022-12-05 11:50:01 UTC
Created attachment 1930043 [details]
vCPU number is not correct

Description of problem:
vCPU number is not correct in Virtualization -> Overview

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Visit Virtualization -> Overview
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Hilda Stastna 2023-01-10 18:12:14 UTC
Hi Guohua,

can you, please, specify the reason of the vCPU number not being correct?
IMHO it seems to be correct - consistent with the displayed graph.
That number you've marked in the attachment just shows the maximum number achieved
in the last day (24h?), not the actual number, as you can see in the graph.
WDYT? Thanks!

Comment 3 Guohua Ouyang 2023-01-11 00:05:21 UTC
Hi Hilda,
Can you explain what the number means there? the number looks too big.

Comment 4 Guohua Ouyang 2023-01-14 00:33:44 UTC
Hi Ronen,
Could you drop in and comment in the bug to tell us why do you think the vCPU number is not making sense.

Comment 5 Ronen 2023-01-14 19:04:51 UTC
@gouyang I agree, the number is too big.
In the screenshot we have 3 VMs, even if they are large, with 16 CPUs each, this number should be 48 vCPU and not over 3,000.
@hstastna is is possible this is not the vCPU but millicores?

Comment 6 Hilda Stastna 2023-01-16 16:51:49 UTC
So I just found that the number is really buggy, but still not sure about the expected result.
Phillip is the best person for that, he's gonna explain more and take this bug asap. So let's be patient now.

Comment 7 Phillip Bailey 2023-01-17 02:36:15 UTC
@rsdeor I apologize. I discovered the root issue after we discussed this bug previously and let it slip through the cracks. The problem is that I didn't realize the disconnect between what was expected in the design and what the metric being used provides. We don't currently have a metric that provides the number of vCPUs in use. We have two vCPU metrics that would work for the charts: kubevirt_vmi_vcpu_seconds and kubevirt_vmi_vcpu_wait_seconds. The vCPU seconds metric is what's used in the metric charts currently. 

We use the vCPU wait seconds metric in at least two charts in the kubevirt dashboard, so I think it's a good candidate for use in the metric charts card.

-------------------------------

kubevirt_vmi_vcpu_seconds [1]: Total amount of time spent in each state by each vcpu. Where id is the vcpu identifier and state can be one of the following: [OFFLINE, RUNNING, BLOCKED]. Type: Counter.

kubevirt_vmi_vcpu_wait_seconds [2]: Amount of time spent by each vcpu while waiting on I/O. Type: Counter.


[1] https://github.com/kubevirt/kubevirt/blob/main/docs/metrics.md#kubevirt_vmi_vcpu_seconds
[2] https://github.com/kubevirt/kubevirt/blob/main/docs/metrics.md#kubevirt_vmi_vcpu_wait_seconds

Comment 8 Ronen 2023-01-17 05:42:24 UTC
@sradco can you help us with which metric should be used here?

Comment 9 Shirly Radco 2023-01-25 10:53:10 UTC
I believe that the query should be 
count (kubevirt_vmi_vcpu_seconds{state="running", namespace="<namespace>"})

Comment 10 Shirly Radco 2023-01-25 11:01:17 UTC
We may also be able to use  
count(kubevirt_vmi_vcpu_wait_seconds{namespace="<namespace>"})

We filtered in the above by state, we should check if this is indeed needed.
The metrics description can be found here https://github.com/kubevirt/kubevirt/blob/main/docs/metrics.md#kubevirt_vmi_vcpu_seconds.

Comment 11 Phillip Bailey 2023-01-25 21:36:13 UTC
@rsdeor 

I'm not sure which metric you'd like to display. Here are the relevant details from the link Shirly provided.

kubevirt_vmi_vcpu_seconds
Total amount of time spent in each state by each vcpu. Where id is the vcpu identifier and state can be one of the following: [OFFLINE, RUNNING, BLOCKED]. Type: Counter.

kubevirt_vmi_vcpu_wait_seconds
Amount of time spent by each vcpu while waiting on I/O. Type: Counter.

I'm not sure which would be more important to the user. We display the wait_seconds metric in the Top Consumers dashboard, but I'm not aware of any place in the UI where the vcpu_seconds metric is being used.

Thoughts?

Comment 12 Ronen 2023-01-26 08:19:25 UTC
@phbailey when Shirly and I did a quick test yesterday, count on both metrics provided the same result as for the number of vCPUs.
When we used kubevirt_vmi_vcpu_seconds we only looked at state="running" (see comment #9)

Comment 13 Phillip Bailey 2023-01-26 14:13:43 UTC
@rsdeor Ah, ok. I didn't realize you looked at those together and considered them one and the same. It's odd that they would return the same value since they're supposed to be counting the seconds spent in different states and not the number of vCPUs. I assume I shouldn't update the axis and header labels to indicate a unit of seconds since the metrics don't appear to be returning seconds?

Comment 14 Ronen 2023-01-26 14:19:28 UTC
@phbailey we did some very basic test to check this, so please double check.
In both cases we got the number of vCPUs, not seconds.
Try running the queries in your environment and verify both for the entire cluster and for a namespace

Comment 15 Phillip Bailey 2023-01-28 16:14:48 UTC
My tests confirmed your result for both cluster and namespace. The PR has already merged and the 4.12 backport has been opened: https://github.com/kubevirt-ui/kubevirt-plugin/pull/1032.

Comment 18 errata-xmlrpc 2023-05-18 02:56:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205


Note You need to log in before you can comment on or make changes to this bug.