Description of problem: While reviewing labels in prometheus noticed the metric kubelet_started_pods_errors_total whose label value is large (700+ characters) since it was an error log message. Example: kubelet_started_pods_errors_total{endpoint="https-metrics", instance="10.0.0.3:10250", job="kubelet", message="rpc error: code = Unknown desc = failed to create pod network sandbox k8s_cluster-image-registry-operator-6c786c84-svlhd_openshift-image-registry_eeb66cce-0b91-4c7d-870c-bf02234342cd_0(8381f96f82dd16971b009a09a3470a19766e316236e7337f194d5dcd5dc5e540): error adding pod openshift-image-registry_cluster-image-registry-operator-6c786c84-svlhd to CNI network "multus-cni-network": Multus: [openshift-image-registry/cluster-image-registry-operator-6c786c84-svlhd]: error getting pod: Get "https://[api-int.ci-ln-121bc45-f76d1.origin-ci-int-gce.dev.openshift.com]:6443/api/v1/namespaces/openshift-image-registry/pods/cluster-image-registry-operator-6c786c84-svlhd?timeout=1m0s": dial tcp 10.0.0.2:6443: connect: connection refused", metrics_path="/metrics", namespace="kube-system", node="ci-ln-121bc45-f76d1-qg7xx-master-2", service="kubelet"} The metric is from https://github.com/kubernetes/kubernetes/blob/v1.22.0-rc.0/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L813 Version-Release number of selected component (if applicable): How reproducible: Query the metric kubelet_started_pods_errors_total in prometheus UI or through api where kubelet is one of the target in prometheus. If there is error the message size will be very large like above example Steps to Reproduce: 1. 2. 3. Actual results: Large label value like the example pasted above Expected results: It doesn't look like we should have error/ log message as a metric Additional info: Upstream issue created for this https://github.com/kubernetes/kubernetes/issues/105163 Creating the bug so that we don't lose the track about this issue
Verified on 4.10.0-0.nightly-2021-12-06-201335. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-12-06-201335 True False 3h52m Cluster version is 4.10.0-0.nightly-2021-12-06-201335
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056