Bug 2196251

Summary: [Kubevirt] Metrics names failed promlint linter
Product: Container Native Virtualization (CNV) Reporter: Aviv Litman <alitman>
Component: MetricsAssignee: Shirly Radco <sradco>
Status: CLOSED MIGRATED QA Contact: Ahmad <ahafe>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.13.0CC: dbasunag, kmajcher, sradco, stirabos
Target Milestone: ---   
Target Release: 4.15.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.15.0.rhel9-1200 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-14 16:18:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aviv Litman 2023-05-08 13:13:52 UTC
Description of problem:
We created a metric name linter in kubevirt based on promlint: https://github.com/kubevirt/kubevirt/pull/9709.
This metrics names failed:

$ make lint-metrics
hack/dockerized "./hack/prom-metric-linter/metric_name_linter.sh"
go version go1.19.2 linux/amd64

go version go1.19.2 linux/amd64
kubevirt_migrate_vmi_pending_count: non-histogram and non-summary metrics should not have "_count" suffix
kubevirt_migrate_vmi_running_count: non-histogram and non-summary metrics should not have "_count" suffix
kubevirt_migrate_vmi_scheduling_count: non-histogram and non-summary metrics should not have "_count" suffix
kubevirt_vmi_cpu_affinity: counter metrics should have "_total" or "_timestamp_seconds" suffix
kubevirt_vmi_filesystem_capacity_bytes_total: non-counter metrics should not have "_total" suffix
kubevirt_vmi_memory_domain_bytes_total: non-counter metrics should not have "_total" suffix
kubevirt_vmi_memory_pgmajfault: counter metrics should have "_total" or "_timestamp_seconds" suffix
kubevirt_vmi_memory_pgminfault: counter metrics should have "_total" or "_timestamp_seconds" suffix
kubevirt_vmi_memory_swap_in_traffic_bytes_total: non-counter metrics should not have "_total" suffix
kubevirt_vmi_memory_swap_out_traffic_bytes_total: non-counter metrics should not have "_total" suffix
kubevirt_vmi_outdated_count: non-histogram and non-summary metrics should not have "_count" suffix
kubevirt_vmi_storage_flush_times_ms_total: metric names should not contain abbreviated units
kubevirt_vmi_storage_read_times_ms_total: metric names should not contain abbreviated units
kubevirt_vmi_storage_write_times_ms_total: metric names should not contain abbreviated units
kubevirt_vmi_vcpu_seconds: counter metrics should have "_total" or "_timestamp_seconds" suffix
kubevirt_vmi_vcpu_wait_seconds: counter metrics should have "_total" or "_timestamp_seconds" suffix
kubevirt_vmsnapshot_disks_restored_from_source_total: non-counter metrics should not have "_total" suffix
make: *** [Makefile:213: lint-metrics] Error 1

Version-Release number of selected component (if applicable):
4.13

How reproducible:
100%

Steps to Reproduce:
1.cd kubevirt
2.make lint-metrics

Actual results:
some metric names are not aligned with promlint. 

Expected results:
Metrics named will be aligned with promlint linter and Prometheus best practices.

Additional info:
as for now the list of metrics are ignored in the linter.

Comment 1 Krzysztof Majcher 2023-09-26 12:52:03 UTC
Waiting for fixing https://bugzilla.redhat.com/show_bug.cgi?id=2239648

Comment 2 Ahmad 2023-10-11 16:41:00 UTC
QE: Verified manually CNV v4.15.0.rhel9-1200

all below metrics are renamed properly :

kubevirt_migrate_vmi_pending
kubevirt_migrate_vmi_running
kubevirt_migrate_vmi_scheduling
kubevirt_vmi_node_cpu_affinity
kubevirt_vmi_node_cpu_affinity
kubevirt_vmi_filesystem_capacity_bytes
kubevirt_vmi_memory_domain_bytes
kubevirt_vmi_memory_pgmajfault
kubevirt_vmi_memory_swap_in_traffic_bytes
kubevirt_vmi_memory_swap_out_traffic_bytes
kubevirt_vmi_number_of_outdated
kubevirt_vmi_storage_flush_times_seconds_total
kubevirt_vmi_storage_read_times_seconds_total
kubevirt_vmi_storage_write_times_seconds_total
kubevirt_vmi_vcpu_seconds_total



example for one of the metrics name execution:

[cloud-user@ocp-psi-executor ahmad]$ oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_number_of_outdated | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kubevirt_vmi_number_of_outdated",
          "container": "virt-controller",
          "endpoint": "metrics",
          "instance": "10.129.2.66:8443",
          "job": "kubevirt-prometheus-metrics",
          "namespace": "openshift-cnv",
          "pod": "virt-controller-75b6df89f9-5hgrs",
          "service": "kubevirt-prometheus-metrics"
        },
        "value": [
          1697042392.801,
          "0"
        ]
      },
      {
        "metric": {
          "__name__": "kubevirt_vmi_number_of_outdated",
          "container": "virt-controller",
          "endpoint": "metrics",
          "instance": "10.131.0.58:8443",
          "job": "kubevirt-prometheus-metrics",
          "namespace": "openshift-cnv",
          "pod": "virt-controller-75b6df89f9-hsp6z",
          "service": "kubevirt-prometheus-metrics"
        },
        "value": [
          1697042392.801,
          "0"
        ]
      }
    ]
  }
}
[cloud-user@ocp-psi-executor ahmad]$ oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_outdated_count | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": []
  }
}
[cloud-user@ocp-psi-executor ahmad]$