Created attachment 1991815 [details] cnv:vmi_status_running:count Description of problem: With vms running on the cluster metric cnv:vmi_status_running:count fail to appear, no values found Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.create vm 2.check metric cnv:vmi_status_running:count 3. Actual results: No Datapoints Found Expected results: metric value shows no. of vms running Additional info:
Hi, using CNV v4.13.4, I don't encounter this issue. Once created VMs and they started running, it took about ~30 seconds for cnv:vmi_status_running:count to appear with the correct value. Prometheus has a default of 1m for evaluating rules, so the delay makes sense to me. Akriti, any chance you evaluated cnv:vmi_status_running:count right after running the first VMs, without waiting long enough? If not, assuming you have a cluster with this issue, it would be really useful if you could attach the output of the following command: "oc get prometheusrule prometheus-k8s-rules-cnv -n openshift-cnv -o yaml"
Akriti, it would also be useful if you can specify the CNV version you encountered this issue with.
cnv:vmi_status_running:count recording rule expression is: sum(kubevirt_vmi_phase_count{phase="running"}) by (node,os,workload,flavor) I can now confirm there is an issue with kubevirt_vmi_phase_count metric which is not working at all, and this affects cnv:vmi_status_running:count recording rule expression. The issue was probably introduced in https://github.com/kubevirt/kubevirt/pull/10424. First impacted version is v4.13.5.rhel9-20, according to http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/?cPRs=10424. Joao, any idea what could be the root cause?
As part of the fix for this bug please add an upstream test to verify that the metric exists and its value is correct.
*** This bug has been marked as a duplicate of bug 2240675 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days