Bug 1802941 - [4.3]Sometimes meet "many-to-many matching not allowed: matching labels must be unique on one side" warn info in prometheus-k8s pod
Summary: [4.3]Sometimes meet "many-to-many matching not allowed: matching labels must ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.5.0
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1807843
TreeView+ depends on / blocked
 
Reported: 2020-02-14 07:48 UTC by Junqi Zhao
Modified: 2020-07-13 17:15 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the evaluation of a few recording rules might occasionally fail. Consequence: the metrics generated from the recording rules are missing. Fix: the recording rules have been fixed. Result: the recording rules always evaluate successfully.
Clone Of:
: 1807843 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:15:07 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github kubernetes-monitoring kubernetes-mixin pull 361 None closed Fix potential many-to-many errors 2020-09-04 00:54:42 UTC
Github openshift cluster-monitoring-operator pull 670 None closed Bug 1802941: fix many-to-many errors 2020-09-04 00:54:42 UTC
Github openshift cluster-monitoring-operator pull 675 None closed Bug 1802941: Fix more many to many errors 2020-09-04 00:54:42 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:15:44 UTC

Description Junqi Zhao 2020-02-14 07:48:44 UTC
Description of problem:
Checked in one 4.3.2 AWS cluster, found "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus-k8s-1 pod's log,
affected recording rule
record: node:node_num_cpu:sum
record: cluster:cpu_core_node_labels
record: cluster:cpu_usage_cores:sum

# oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:35.304Z caller=manager.go:525 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-60-28.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-59-196.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:36.735Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_core_node_labels\nexpr: cluster:nodes_roles * on(node) group_right(label_beta_kubernetes_io_instance_type,\n  label_node_role_kubernetes_io, label_node_openshift_io_os_id, label_kubernetes_io_arch,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) label_replace(cluster:cpu_core_hyperthreading,\n  \"node\", \"$1\", \"instance\", \"(.*)\")\n" err="found duplicate series for the match group {node=\"ip-10-0-52-100.us-east-2.compute.internal\"} on the left hand-side of the operation: [{__name__=\"cluster:nodes_roles\", endpoint=\"https-main\", instance=\"10.131.0.21:8443\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m4.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-52-100.us-east-2.compute.internal\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhel\", namespace=\"openshift-monitoring\", node=\"ip-10-0-52-100.us-east-2.compute.internal\", pod=\"kube-state-metrics-75679bfbf5-vg9qv\", service=\"kube-state-metrics\"}, {__name__=\"cluster:nodes_roles\", endpoint=\"https-main\", instance=\"10.130.2.11:8443\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m4.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-52-100.us-east-2.compute.internal\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhel\", namespace=\"openshift-monitoring\", node=\"ip-10-0-52-100.us-east-2.compute.internal\", pod=\"kube-state-metrics-75679bfbf5-nt8jx\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:36.737Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_usage_cores:sum\nexpr: sum(1 - rate(node_cpu_seconds_total{mode=\"idle\"}[2m]) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-60-28.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-59-196.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.2     True        False         128m    Cluster version is 4.3.2


How reproducible:
Sometimes

Steps to Reproduce:
1. oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Junqi Zhao 2020-03-06 10:10:00 UTC
Tested with 4.5.0-0.ci-2020-03-04-223611, the changes are already in the payload, and did not see the errors from prometheus container

Comment 7 errata-xmlrpc 2020-07-13 17:15:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.