Bug 2241904 - Metric cnv:vmi_status_running:count show no datapoint found
Summary: Metric cnv:vmi_status_running:count show no datapoint found
Keywords:
Status: CLOSED DUPLICATE of bug 2240675
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Metrics
Version: 4.13.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.13.5
Assignee: Assaf Admi
QA Contact: Natalie Gavrielov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-10-03 10:43 UTC by Akriti Gupta
Modified: 2024-02-03 04:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-05 10:53:59 UTC
Target Upstream Version:
Embargoed:
akrgupta: needinfo+
akrgupta: needinfo+


Attachments (Terms of Use)
cnv:vmi_status_running:count (12.71 KB, image/png)
2023-10-03 10:43 UTC, Akriti Gupta
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-33662 0 None None None 2023-10-03 10:45:54 UTC

Description Akriti Gupta 2023-10-03 10:43:33 UTC
Created attachment 1991815 [details]
cnv:vmi_status_running:count

Description of problem: With vms running on the cluster metric cnv:vmi_status_running:count fail to appear, no values found 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.create vm 
2.check metric cnv:vmi_status_running:count
3.

Actual results:
No Datapoints Found

Expected results:
metric value shows no. of vms running

Additional info:

Comment 1 Assaf Admi 2023-10-04 08:25:48 UTC
Hi, using CNV v4.13.4, I don't encounter this issue. Once created VMs and they started running, it took about ~30 seconds for cnv:vmi_status_running:count to appear with the correct value. Prometheus has a default of 1m for evaluating rules, so the delay makes sense to me. 

Akriti, any chance you evaluated cnv:vmi_status_running:count right after running the first VMs, without waiting long enough? 
If not, assuming you have a cluster with this issue, it would be really useful if you could attach the output of the following command:
"oc get prometheusrule prometheus-k8s-rules-cnv -n openshift-cnv -o yaml"

Comment 2 Assaf Admi 2023-10-04 08:30:14 UTC
Akriti, it would also be useful if you can specify the CNV version you encountered this issue with.

Comment 4 Assaf Admi 2023-10-04 11:11:41 UTC
cnv:vmi_status_running:count recording rule expression is: 
sum(kubevirt_vmi_phase_count{phase="running"}) by (node,os,workload,flavor)

I can now confirm there is an issue with kubevirt_vmi_phase_count metric which is not working at all, and this affects cnv:vmi_status_running:count recording rule expression. 
The issue was probably introduced in https://github.com/kubevirt/kubevirt/pull/10424. First impacted version is v4.13.5.rhel9-20, according to http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/?cPRs=10424.

Joao, any idea what could be the root cause?

Comment 5 Shirly Radco 2023-10-05 08:09:52 UTC
As part of the fix for this bug please add an upstream test to verify that the metric exists and its value is correct.

Comment 6 Assaf Admi 2023-10-05 10:53:59 UTC

*** This bug has been marked as a duplicate of bug 2240675 ***

Comment 7 Red Hat Bugzilla 2024-02-03 04:25:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.