1802941 – [4.3]Sometimes meet "many-to-many matching not allowed: matching labels must be unique on one side" warn info in prometheus-k8s pod

Bug 1802941 - [4.3]Sometimes meet "many-to-many matching not allowed: matching labels must be unique on one side" warn info in prometheus-k8s pod

Summary: [4.3]Sometimes meet "many-to-many matching not allowed: matching labels must ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1807843
TreeView+	depends on / blocked

Reported:	2020-02-14 07:48 UTC by Junqi Zhao
Modified:	2020-07-13 17:15 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: the evaluation of a few recording rules might occasionally fail. Consequence: the metrics generated from the recording rules are missing. Fix: the recording rules have been fixed. Result: the recording rules always evaluate successfully.
Clone Of:
Clones:	1807843 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:15:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubernetes-monitoring kubernetes-mixin pull 361	None	closed	Fix potential many-to-many errors	2021-02-02 20:29:08 UTC
Github	openshift cluster-monitoring-operator pull 670	None	closed	Bug 1802941: fix many-to-many errors	2021-02-02 20:29:08 UTC
Github	openshift cluster-monitoring-operator pull 675	None	closed	Bug 1802941: Fix more many to many errors	2021-02-02 20:29:08 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:15:44 UTC

Description Junqi Zhao 2020-02-14 07:48:44 UTC

Description of problem:
Checked in one 4.3.2 AWS cluster, found "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus-k8s-1 pod's log,
affected recording rule
record: node:node_num_cpu:sum
record: cluster:cpu_core_node_labels
record: cluster:cpu_usage_cores:sum

# oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:35.304Z caller=manager.go:525 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-60-28.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-59-196.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:36.735Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_core_node_labels\nexpr: cluster:nodes_roles * on(node) group_right(label_beta_kubernetes_io_instance_type,\n  label_node_role_kubernetes_io, label_node_openshift_io_os_id, label_kubernetes_io_arch,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) label_replace(cluster:cpu_core_hyperthreading,\n  \"node\", \"$1\", \"instance\", \"(.*)\")\n" err="found duplicate series for the match group {node=\"ip-10-0-52-100.us-east-2.compute.internal\"} on the left hand-side of the operation: [{__name__=\"cluster:nodes_roles\", endpoint=\"https-main\", instance=\"10.131.0.21:8443\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m4.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-52-100.us-east-2.compute.internal\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhel\", namespace=\"openshift-monitoring\", node=\"ip-10-0-52-100.us-east-2.compute.internal\", pod=\"kube-state-metrics-75679bfbf5-vg9qv\", service=\"kube-state-metrics\"}, {__name__=\"cluster:nodes_roles\", endpoint=\"https-main\", instance=\"10.130.2.11:8443\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m4.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-52-100.us-east-2.compute.internal\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhel\", namespace=\"openshift-monitoring\", node=\"ip-10-0-52-100.us-east-2.compute.internal\", pod=\"kube-state-metrics-75679bfbf5-nt8jx\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:36.737Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_usage_cores:sum\nexpr: sum(1 - rate(node_cpu_seconds_total{mode=\"idle\"}[2m]) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-60-28.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-59-196.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.2     True        False         128m    Cluster version is 4.3.2


How reproducible:
Sometimes

Steps to Reproduce:
1. oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Junqi Zhao 2020-03-06 10:10:00 UTC

Tested with 4.5.0-0.ci-2020-03-04-223611, the changes are already in the payload, and did not see the errors from prometheus container

Comment 7 errata-xmlrpc 2020-07-13 17:15:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.