1772889 – VM Dashboard doesn't display historical data for CPU Utilization

Bug 1772889 - VM Dashboard doesn't display historical data for CPU Utilization

Summary: VM Dashboard doesn't display historical data for CPU Utilization

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Console Kubevirt Plugin
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Marek Libra
QA Contact:	Nelly Credi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-11-15 13:11 UTC by Radim Hrazdil
Modified:	2020-05-04 11:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-04 11:15:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
screenshot (24.97 KB, image/png) 2019-11-15 13:11 UTC, Radim Hrazdil	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 4227	0	None	closed	Bug 1772889: Do not show VM Utilization graphs for an off-VM	2020-03-05 10:09:54 UTC
Red Hat Product Errata	RHBA-2020:0581	0	None	None	None	2020-05-04 11:16:06 UTC

Description Radim Hrazdil 2019-11-15 13:11:03 UTC

Created attachment 1636494 [details]
screenshot

Description of problem:
When a VM is started and then stopped, VM Dashboard Utilization card displays historical data for Memory, Filesystem and Network, but not for CPU.



Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-02-092336

How reproducible:
100%

Steps to Reproduce:
1. Start a VM
2. Stop a VM
3. 

Actual results:
CPU data are not available, other telemetry data remains displayed

Expected results:


Additional info:

Comment 1 Marek Libra 2019-11-15 13:29:56 UTC

Data points are retrieved from Prometheus. There is recently no suitable CPU metric for VMs, so a pod-metric is reused here, matching is via launcher-pod name.

When the VM is off, the laucnher-pod is unknown and no cpu data points are matched.
Data points for other metrics are searched by VM name, so they stay available.

We can:
- disable utilization card when the VM is off
- add cpu metric for VMs
- document this state and provide suitable tooltip (but still bas user experience, imo)

I think the best option is the 2nd one but will take long to implement (cross-team).
With the 1st option we will use just the historical data as no new data points are coming if the VM is off. So this sounds like the best option for now.

Comment 2 Tomas Jelinek 2019-11-18 09:09:25 UTC

(In reply to Marek Libra from comment #1)
> Data points are retrieved from Prometheus. There is recently no suitable CPU
> metric for VMs, so a pod-metric is reused here, matching is via launcher-pod
> name.

Francesco: I think you were exposing the VM metrics to prometheus. Do you know if there is some place where the missing metrics are being tracked?

> 
> When the VM is off, the laucnher-pod is unknown and no cpu data points are
> matched.
> Data points for other metrics are searched by VM name, so they stay
> available.
> 
> We can:
> - disable utilization card when the VM is off
> - add cpu metric for VMs
> - document this state and provide suitable tooltip (but still bas user
> experience, imo)
> 
> I think the best option is the 2nd one but will take long to implement
> (cross-team).
> With the 1st option we will use just the historical data as no new data
> points are coming if the VM is off. So this sounds like the best option for
> now.

Comment 3 Francesco Romani 2019-11-18 09:32:17 UTC

(In reply to Tomas Jelinek from comment #2)
> (In reply to Marek Libra from comment #1)
> > Data points are retrieved from Prometheus. There is recently no suitable CPU
> > metric for VMs, so a pod-metric is reused here, matching is via launcher-pod
> > name.
> 
> Francesco: I think you were exposing the VM metrics to prometheus. Do you
> know if there is some place where the missing metrics are being tracked?

Indeed I had, but not working on this anymore. sgott knows who is taking care of metrics now.
That said, I'm not aware of any tracking re: missing metrics
Please also note that AFAIK there are no gaps re: metrics with respect to oVirt.

Comment 4 Marek Libra 2019-11-18 09:40:33 UTC

sgott, can you please suggest suitable metric to track VM CPU utilization? If it is missing, is there a place to track it? Can it be added in 4.3 time-frame?

Comment 5 Francesco Romani 2019-11-18 09:45:55 UTC

(In reply to Marek Libra from comment #4)
> sgott, can you please suggest suitable metric to track VM CPU utilization?
> If it is missing, is there a place to track it? Can it be added in 4.3
> time-frame?

Just curious, could you please elaborate why https://github.com/kubevirt/kubevirt/blob/master/pkg/monitoring/vms/prometheus/prometheus.go#L89 isn't good enough? This is supposed to provide raw data from which we can compute the CPU utilization

Comment 6 Marek Libra 2019-11-18 10:56:29 UTC

IIUC, the unit for this query is the "time". But per design, milli-cores/cores are expected to be shown to be aligned with other resource pages in UI (namely pods).

Comment 7 Marek Libra 2019-11-19 09:12:17 UTC

need info flag accidentally removed by previous comment, adding back

Comment 8 sgott 2020-01-31 15:50:11 UTC

Strictly speaking, I'm not sure that it's possible to collect VM CPU utilization outside of how we already approach this--because we can't guarantee that a guest user agent will be present.

Thus we cannot provide a CPU metric for VMs in general (as requested in Comment #1). Furthermore, adding a pod metric to a VM and calling it a VM metric is really not a reasonable course of action at the KubeVirt API level. In my opinion "disable utilization card when the VM is off" is really the best long term solution here.

Comment 9 Marek Libra 2020-02-05 11:18:12 UTC

There's additional issue which can be fixed along this original one: 
Once the VM is stopped, no new datagrams are being created. The last datagram in the graph gets stucked to the one from the time when the VM has been stopped.
As the time-axis neither contains labels nor reflects reality, the "value" (Y-axis) never gets to 0 value, as one would intuitively expected.

Both issues can be fixed by:
- showing "No datapoints found." for an off-VM
- dropping (filtering-out) datagrams with timestamps older then the create timestamp of the VirtualMachineInstance object

Comment 11 Guohua Ouyang 2020-02-24 03:03:44 UTC

verified on 4.4.0-0.nightly-2020-02-22-102956.

Comment 13 errata-xmlrpc 2020-05-04 11:15:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.