Bug 2241904

Summary:

Metric cnv:vmi_status_running:count show no datapoint found

Product:

Container Native Virtualization (CNV)

Reporter:

Akriti Gupta <akrgupta>

Component:

Metrics

Assignee:

Assaf Admi <aadmi>

Status:

CLOSED DUPLICATE

QA Contact:

Natalie Gavrielov <ngavrilo>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.13.5

CC:

aadmi, jvilaca, sradco, stirabos

Target Milestone:

---

Flags:

akrgupta: needinfo+
akrgupta: needinfo+

Target Release:

4.13.5

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2023-10-05 10:53:59 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
cnv:vmi_status_running:count	none

Description Akriti Gupta 2023-10-03 10:43:33 UTC

Created attachment 1991815 [details]
cnv:vmi_status_running:count

Description of problem: With vms running on the cluster metric cnv:vmi_status_running:count fail to appear, no values found 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.create vm 
2.check metric cnv:vmi_status_running:count
3.

Actual results:
No Datapoints Found

Expected results:
metric value shows no. of vms running

Additional info:

Comment 1 Assaf Admi 2023-10-04 08:25:48 UTC

Hi, using CNV v4.13.4, I don't encounter this issue. Once created VMs and they started running, it took about ~30 seconds for cnv:vmi_status_running:count to appear with the correct value. Prometheus has a default of 1m for evaluating rules, so the delay makes sense to me. 

Akriti, any chance you evaluated cnv:vmi_status_running:count right after running the first VMs, without waiting long enough? 
If not, assuming you have a cluster with this issue, it would be really useful if you could attach the output of the following command:
"oc get prometheusrule prometheus-k8s-rules-cnv -n openshift-cnv -o yaml"

Comment 2 Assaf Admi 2023-10-04 08:30:14 UTC

Akriti, it would also be useful if you can specify the CNV version you encountered this issue with.

Comment 4 Assaf Admi 2023-10-04 11:11:41 UTC

cnv:vmi_status_running:count recording rule expression is: 
sum(kubevirt_vmi_phase_count{phase="running"}) by (node,os,workload,flavor)

I can now confirm there is an issue with kubevirt_vmi_phase_count metric which is not working at all, and this affects cnv:vmi_status_running:count recording rule expression. 
The issue was probably introduced in https://github.com/kubevirt/kubevirt/pull/10424. First impacted version is v4.13.5.rhel9-20, according to http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/?cPRs=10424.

Joao, any idea what could be the root cause?

Comment 5 Shirly Radco 2023-10-05 08:09:52 UTC

As part of the fix for this bug please add an upstream test to verify that the metric exists and its value is correct.

Comment 6 Assaf Admi 2023-10-05 10:53:59 UTC


*** This bug has been marked as a duplicate of bug 2240675 ***

Comment 7 Red Hat Bugzilla 2024-02-03 04:25:13 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days