Bug 2172544

Summary: empty libvirt metrics output, when executing metrics Prometheus query on a vm that is paused status
Product: Container Native Virtualization (CNV) Reporter: Ahmad <ahafe>
Component: MetricsAssignee: João Vilaça <jvilaca>
Status: CLOSED ERRATA QA Contact: Ahmad <ahafe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13.0CC: dbasunag, kmajcher, lpivarc, sradco, stirabos
Target Milestone: ---   
Target Release: 4.13.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.13.1.rhel9-11 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 14:09:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ahmad 2023-02-22 13:45:46 UTC
Description of problem:

when running any libvirt metric on Virtual machine that is in paused status, by executing Prometheus query the output is empty.


Version-Release number of selected component (4.12.1 - 4.13.0):

How reproducible:
100%



Steps to Reproduce:
1.create a VM > oc create -f examples/vm-cirros.yaml (or any other vm)
2.pause the VM by executing the following command > virtctl pause VM <VM name>
3. wait ~30 seconds 
3.run any libvirt metric by executing Prometheus query >  example:
     oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s 
     http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_cpu_usage_seconds | jq .


 
 
Actual results:

oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_cpu_usage_seconds | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": []
  }



Expected results:


oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_cpu_usage_seconds | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kubevirt_vmi_cpu_usage_seconds",
          "container": "virt-handler",
          "endpoint": "metrics",
          "instance": "10.131.0.79:8443",
          "job": "kubevirt-prometheus-metrics",
          "kubernetes_vmi_label_kubevirt_io_nodeName": "iuo-ahmad413d-s7dpx-worker-0-clgnq",
          "kubernetes_vmi_label_kubevirt_io_vm": "vm-cirros-source-hpp",
          "name": "vm-cirros-source-hpp",
          "namespace": "default",
          "node": "iuo-ahmad413d-s7dpx-worker-0-clgnq",
          "pod": "virt-handler-dzztm",
          "service": "kubevirt-prometheus-metrics"
        },
        "value": [
          1677072210.101,
          "52"
        ]
      }
    ]
  }
}


Additional info:

Comment 1 João Vilaça 2023-03-10 11:29:05 UTC
When a VM is paused, the `virt-launcher` pod is still able to output
information for `virsh domstats`. I took a look at the KubeVirt code and
everything is working fine there, but the call to the libvirt
(https://github.com/kubevirt/kubevirt/blob/74e0b46786d550ab94e74502af17f34c94b5f95b/pkg/virt-launcher/virtwrap/cli/libvirt.go#L260)
is returning empty results. So I suspect the issue must be somewhere in
https://gitlab.com/libvirt/libvirt-go-module/-/blob/v1.9000.0/connect.go#L3117.

@lpivarc is this correct?

Comment 3 Ahmad 2023-06-05 11:41:33 UTC
QA:Fixed 
tested v4.13.1 
Prometheus query is now shows results when a vm that is in  paused status

step1: create VM
step2: pause pause VM
step 3: execute Prometheus query 
-----

oc get vm
NAME                      AGE   STATUS    READY
fedora-imcy5t0e8cpglqfq   25h   Running   True
[cloud-user@ocp-ipi-executor-xl ahmad]$ oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_cpu_usage_seconds | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kubevirt_vmi_cpu_usage_seconds",
          "container": "virt-handler",
          "endpoint": "metrics",
          "instance": "10.128.2.55:8443",
          "job": "kubevirt-prometheus-metrics",
          "kubernetes_vmi_label_kubevirt_io_domain": "fedora-imcy5t0e8cpglqfq",
          "kubernetes_vmi_label_kubevirt_io_nodeName": "c01-ahmad414-6h2hq-worker-0-slnsf",
          "kubernetes_vmi_label_kubevirt_io_size": "small",
          "name": "fedora-imcy5t0e8cpglqfq",
          "namespace": "default",
          "node": "c01-ahmad414-6h2hq-worker-0-slnsf",
          "pod": "virt-handler-ngzkm",
          "service": "kubevirt-prometheus-metrics"
        },
        "value": [
          1685963404.392,
          "717"
        ]
      }
    ]
  }
}
[cloud-user@ocp-ipi-executor-xl ahmad]$ virtctl pause vm fedora-imcy5t0e8cpglqfq
VMI fedora-imcy5t0e8cpglqfq was scheduled to pause
[cloud-user@ocp-ipi-executor-xl ahmad]$ oc get vm
NAME                      AGE   STATUS   READY
fedora-imcy5t0e8cpglqfq   25h   Paused   False
[cloud-user@ocp-ipi-executor-xl ahmad]$ oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_cpu_usage_seconds | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kubevirt_vmi_cpu_usage_seconds",
          "container": "virt-handler",
          "endpoint": "metrics",
          "instance": "10.128.2.55:8443",
          "job": "kubevirt-prometheus-metrics",
          "kubernetes_vmi_label_kubevirt_io_domain": "fedora-imcy5t0e8cpglqfq",
          "kubernetes_vmi_label_kubevirt_io_nodeName": "c01-ahmad414-6h2hq-worker-0-slnsf",
          "kubernetes_vmi_label_kubevirt_io_size": "small",
          "name": "fedora-imcy5t0e8cpglqfq",
          "namespace": "default",
          "node": "c01-ahmad414-6h2hq-worker-0-slnsf",
          "pod": "virt-handler-ngzkm",
          "service": "kubevirt-prometheus-metrics"
        },
        "value": [
          1685963535.211,
          "717"
        ]
      }
    ]
  }
}

Comment 14 errata-xmlrpc 2023-08-16 14:09:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.13.3 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:4664