2241904 – Metric cnv:vmi_status_running:count show no datapoint found

Bug 2241904 - Metric cnv:vmi_status_running:count show no datapoint found

Summary: Metric cnv:vmi_status_running:count show no datapoint found

Keywords:
Status:	CLOSED DUPLICATE of bug 2240675
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Metrics
Sub Component:
Version:	4.13.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.13.5
Assignee:	Assaf Admi
QA Contact:	Natalie Gavrielov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-10-03 10:43 UTC by Akriti Gupta
Modified:	2024-02-03 04:25 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-10-05 10:53:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	akrgupta: needinfo+ akrgupta: needinfo+

Attachments	(Terms of Use)
cnv:vmi_status_running:count (12.71 KB, image/png) 2023-10-03 10:43 UTC, Akriti Gupta	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	CNV-33662	0	None	None	None	2023-10-03 10:45:54 UTC

Description Akriti Gupta 2023-10-03 10:43:33 UTC

Created attachment 1991815 [details]
cnv:vmi_status_running:count

Description of problem: With vms running on the cluster metric cnv:vmi_status_running:count fail to appear, no values found 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.create vm 
2.check metric cnv:vmi_status_running:count
3.

Actual results:
No Datapoints Found

Expected results:
metric value shows no. of vms running

Additional info:

Comment 1 Assaf Admi 2023-10-04 08:25:48 UTC

Hi, using CNV v4.13.4, I don't encounter this issue. Once created VMs and they started running, it took about ~30 seconds for cnv:vmi_status_running:count to appear with the correct value. Prometheus has a default of 1m for evaluating rules, so the delay makes sense to me. 

Akriti, any chance you evaluated cnv:vmi_status_running:count right after running the first VMs, without waiting long enough? 
If not, assuming you have a cluster with this issue, it would be really useful if you could attach the output of the following command:
"oc get prometheusrule prometheus-k8s-rules-cnv -n openshift-cnv -o yaml"

Comment 2 Assaf Admi 2023-10-04 08:30:14 UTC

Akriti, it would also be useful if you can specify the CNV version you encountered this issue with.

Comment 4 Assaf Admi 2023-10-04 11:11:41 UTC

cnv:vmi_status_running:count recording rule expression is: 
sum(kubevirt_vmi_phase_count{phase="running"}) by (node,os,workload,flavor)

I can now confirm there is an issue with kubevirt_vmi_phase_count metric which is not working at all, and this affects cnv:vmi_status_running:count recording rule expression. 
The issue was probably introduced in https://github.com/kubevirt/kubevirt/pull/10424. First impacted version is v4.13.5.rhel9-20, according to http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/?cPRs=10424.

Joao, any idea what could be the root cause?

Comment 5 Shirly Radco 2023-10-05 08:09:52 UTC

As part of the fix for this bug please add an upstream test to verify that the metric exists and its value is correct.

Comment 6 Assaf Admi 2023-10-05 10:53:59 UTC


*** This bug has been marked as a duplicate of bug 2240675 ***

Comment 7 Red Hat Bugzilla 2024-02-03 04:25:13 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.