2209046 – kubevirt_vmi_cpu_affinity metric is enriched with an excessive amount of labels

Bug 2209046 - kubevirt_vmi_cpu_affinity metric is enriched with an excessive amount of labels

Summary: kubevirt_vmi_cpu_affinity metric is enriched with an excessive amount of labels

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Metrics
Sub Component:
Version:	4.12.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.14.0
Assignee:	João Vilaça
QA Contact:	Ahmad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-05-22 12:23 UTC by Shirly Radco
Modified:	2024-03-08 04:25 UTC (History)
CC List:	4 users (show)
Fixed In Version:	hco-bundle-registry-container-v4.14.0.rhel9-1744
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-11-08 14:05:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 10070	None	Merged	Remove affinities label from kubevirt_vmi_cpu_affinity and use sum as value	2023-08-10 15:44:21 UTC
Github	kubevirt kubevirt pull 10266	None	Merged	[release-1.0] Remove affinities label from kubevirt_vmi_cpu_affinity and use sum as value	2023-08-11 10:45:56 UTC
Red Hat Issue Tracker	CNV-28961	None	None	None	2023-05-22 12:25:40 UTC
Red Hat Product Errata	RHSA-2023:6817	None	None	None	2023-11-08 14:06:11 UTC

Description Shirly Radco 2023-05-22 12:23:50 UTC

Description of problem:
Issue reported in https://github.com/kubevirt/kubevirt/issues/9713.

In https://github.com/kubevirt/kubevirt/pull/5191 the kubevirt_vmi_cpu_affinity metric got introduced. Given X pcpus and Y vcpus, there can be up to X*Y labels generated for a single VM.

A concrete example: 192 pcore and 48 vcpu lead to 48*192 (9216!) labels. This is something which can not easily be digested by all kinds of metric pipelines.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Starting a VM with a single vCPU without explicit pinning, has an affinity for all CPUs.
2.
3.

Actual results:


Expected results:
Let's see that we shrink or remove this. Since this is "just" a boolean map expressing CPU pinning layout, this looks like something which should not be exposed via core kubevirt via prometheus metrics.

Just doing this for explicit cpu pinning may be an option, but even then getting 50-100 labels would be the result which is probably still above an acceptable number of labels per metric, considering that VM labels and such are added as well and we want to leave room for user-specific labels.

Additional info:
Example for a single vCPU on a 8-core machine:

# virsh vcpuinfo default_fedora
VCPU:           0
CPU:            4
State:          running
CPU time:       33.5s
CPU Affinity:   yyyyyyyy

Note the 8 `y`s

Comment 1 Krzysztof Majcher 2023-08-08 12:51:34 UTC

Please also backport to release-1.0

Comment 2 Simone Tiraboschi 2023-08-10 15:44:45 UTC

https://github.com/kubevirt/kubevirt/pull/10266 is still open, moving back to post

Comment 3 Ahmad 2023-09-11 10:13:25 UTC

@sradco @jvilaca @dbasunag 

according to https://github.com/kubevirt/kubevirt/pull/10266 & https://github.com/kubevirt/kubevirt/issues/9713  The metric 'kubevirt_vmi_cpu_affinity' is no longer appears, it was replaced with another different behavior metric "kubevirt_vmi_node_cpu_affinity"

kubevirt_vmi_node_cpu_affinity: 
Number of VMI CPU affinities to node physical cores. Type: Gauge.

do we still need a qa verify here?

Comment 4 Ahmad 2023-09-18 15:38:01 UTC

QA tested CNV-v4.14.0.rhel9-1991 OCP-4.14.0-ec.3


metric kubevirt_vmi_node_cpu_affinity works properly
 
virsh # vcpupin 1
 VCPU   CPU Affinity
----------------------
 0      0-7


virsh # vcpuinfo 1
VCPU:           0
CPU:            2
State:          running
CPU time:       36.9s
CPU Affinity:   yyyyyyyy

[cloud-user@ocp-psi-executor ahmad]$ oc exec -n openshift-monitoring prometheus-k8s-0 -c prometheus -- curl -s http://127.0.0.1:9090/api/v1/query?query=kubevirt_vmi_node_cpu_affinity | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kubevirt_vmi_node_cpu_affinity",
          "container": "virt-handler",
          "endpoint": "metrics",
          "instance": "10.129.2.82:8443",
          "job": "kubevirt-prometheus-metrics",
          "kubernetes_vmi_label_kubevirt_io_domain": "vm-cirros-datavolumes",
          "kubernetes_vmi_label_kubevirt_io_nodeName": "c01-ahmad414-rvpjr-worker-0-bptzv",
          "name": "vm-cirros-datavolumes",
          "namespace": "default",
          "node": "c01-ahmad414-rvpjr-worker-0-bptzv",
          "pod": "virt-handler-tkfp5",
          "service": "kubevirt-prometheus-metrics"
        },
        "value": [
          1695051423.341,
          "8"
        ]
      }
    ]
  }
}
[cloud-user@ocp-psi-executor ahmad]$

Comment 6 errata-xmlrpc 2023-11-08 14:05:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Comment 7 Red Hat Bugzilla 2024-03-08 04:25:45 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.