2150832 – vCPU number is not correct in Virtualization -> Overview

Bug 2150832 - vCPU number is not correct in Virtualization -> Overview

Summary: vCPU number is not correct in Virtualization -> Overview

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	User Experience
Sub Component:
Version:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.13.0
Assignee:	Phillip Bailey
QA Contact:	Guohua Ouyang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-05 11:50 UTC by Guohua Ouyang
Modified:	2023-05-18 02:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-05-18 02:56:16 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
vCPU number is not correct (171.87 KB, image/png) 2022-12-05 11:50 UTC, Guohua Ouyang	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt-ui kubevirt-plugin pull 1030	None	open	Bug 2150832: Fix vCPU query in cluster overview metrics card	2023-01-27 13:45:13 UTC
Red Hat Issue Tracker	CNV-23124	None	None	None	2022-12-05 11:59:55 UTC
Red Hat Product Errata	RHSA-2023:3205	None	None	None	2023-05-18 02:56:27 UTC

Description Guohua Ouyang 2022-12-05 11:50:01 UTC

Created attachment 1930043 [details]
vCPU number is not correct

Description of problem:
vCPU number is not correct in Virtualization -> Overview

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Visit Virtualization -> Overview
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Hilda Stastna 2023-01-10 18:12:14 UTC

Hi Guohua,

can you, please, specify the reason of the vCPU number not being correct?
IMHO it seems to be correct - consistent with the displayed graph.
That number you've marked in the attachment just shows the maximum number achieved
in the last day (24h?), not the actual number, as you can see in the graph.
WDYT? Thanks!

Comment 3 Guohua Ouyang 2023-01-11 00:05:21 UTC

Hi Hilda,
Can you explain what the number means there? the number looks too big.

Comment 4 Guohua Ouyang 2023-01-14 00:33:44 UTC

Hi Ronen,
Could you drop in and comment in the bug to tell us why do you think the vCPU number is not making sense.

Comment 5 Ronen 2023-01-14 19:04:51 UTC

@gouyang I agree, the number is too big.
In the screenshot we have 3 VMs, even if they are large, with 16 CPUs each, this number should be 48 vCPU and not over 3,000.
@hstastna is is possible this is not the vCPU but millicores?

Comment 6 Hilda Stastna 2023-01-16 16:51:49 UTC

So I just found that the number is really buggy, but still not sure about the expected result.
Phillip is the best person for that, he's gonna explain more and take this bug asap. So let's be patient now.

Comment 7 Phillip Bailey 2023-01-17 02:36:15 UTC

@rsdeor I apologize. I discovered the root issue after we discussed this bug previously and let it slip through the cracks. The problem is that I didn't realize the disconnect between what was expected in the design and what the metric being used provides. We don't currently have a metric that provides the number of vCPUs in use. We have two vCPU metrics that would work for the charts: kubevirt_vmi_vcpu_seconds and kubevirt_vmi_vcpu_wait_seconds. The vCPU seconds metric is what's used in the metric charts currently. 

We use the vCPU wait seconds metric in at least two charts in the kubevirt dashboard, so I think it's a good candidate for use in the metric charts card.

-------------------------------

kubevirt_vmi_vcpu_seconds [1]: Total amount of time spent in each state by each vcpu. Where id is the vcpu identifier and state can be one of the following: [OFFLINE, RUNNING, BLOCKED]. Type: Counter.

kubevirt_vmi_vcpu_wait_seconds [2]: Amount of time spent by each vcpu while waiting on I/O. Type: Counter.


[1] https://github.com/kubevirt/kubevirt/blob/main/docs/metrics.md#kubevirt_vmi_vcpu_seconds
[2] https://github.com/kubevirt/kubevirt/blob/main/docs/metrics.md#kubevirt_vmi_vcpu_wait_seconds

Comment 8 Ronen 2023-01-17 05:42:24 UTC

@sradco can you help us with which metric should be used here?

Comment 9 Shirly Radco 2023-01-25 10:53:10 UTC

I believe that the query should be 
count (kubevirt_vmi_vcpu_seconds{state="running", namespace="<namespace>"})

Comment 10 Shirly Radco 2023-01-25 11:01:17 UTC

We may also be able to use  
count(kubevirt_vmi_vcpu_wait_seconds{namespace="<namespace>"})

We filtered in the above by state, we should check if this is indeed needed.
The metrics description can be found here https://github.com/kubevirt/kubevirt/blob/main/docs/metrics.md#kubevirt_vmi_vcpu_seconds.

Comment 11 Phillip Bailey 2023-01-25 21:36:13 UTC

@rsdeor 

I'm not sure which metric you'd like to display. Here are the relevant details from the link Shirly provided.

kubevirt_vmi_vcpu_seconds
Total amount of time spent in each state by each vcpu. Where id is the vcpu identifier and state can be one of the following: [OFFLINE, RUNNING, BLOCKED]. Type: Counter.

kubevirt_vmi_vcpu_wait_seconds
Amount of time spent by each vcpu while waiting on I/O. Type: Counter.

I'm not sure which would be more important to the user. We display the wait_seconds metric in the Top Consumers dashboard, but I'm not aware of any place in the UI where the vcpu_seconds metric is being used.

Thoughts?

Comment 12 Ronen 2023-01-26 08:19:25 UTC

@phbailey when Shirly and I did a quick test yesterday, count on both metrics provided the same result as for the number of vCPUs.
When we used kubevirt_vmi_vcpu_seconds we only looked at state="running" (see comment #9)

Comment 13 Phillip Bailey 2023-01-26 14:13:43 UTC

@rsdeor Ah, ok. I didn't realize you looked at those together and considered them one and the same. It's odd that they would return the same value since they're supposed to be counting the seconds spent in different states and not the number of vCPUs. I assume I shouldn't update the axis and header labels to indicate a unit of seconds since the metrics don't appear to be returning seconds?

Comment 14 Ronen 2023-01-26 14:19:28 UTC

@phbailey we did some very basic test to check this, so please double check.
In both cases we got the number of vCPUs, not seconds.
Try running the queries in your environment and verify both for the entire cluster and for a namespace

Comment 15 Phillip Bailey 2023-01-28 16:14:48 UTC

My tests confirmed your result for both cluster and namespace. The PR has already merged and the 4.12 backport has been opened: https://github.com/kubevirt-ui/kubevirt-plugin/pull/1032.

Comment 18 errata-xmlrpc 2023-05-18 02:56:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.13.0 Images security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:3205

Note You need to log in before you can comment on or make changes to this bug.