Bug 1802941

Summary: [4.3]Sometimes meet "many-to-many matching not allowed: matching labels must be unique on one side" warn info in prometheus-k8s pod
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.3.zCC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone: ---Keywords: Regression
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: the evaluation of a few recording rules might occasionally fail. Consequence: the metrics generated from the recording rules are missing. Fix: the recording rules have been fixed. Result: the recording rules always evaluate successfully.
Story Points: ---
Clone Of:
: 1807843 (view as bug list) Environment:
Last Closed: 2020-07-13 17:15:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1807843    

Description Junqi Zhao 2020-02-14 07:48:44 UTC
Description of problem:
Checked in one 4.3.2 AWS cluster, found "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus-k8s-1 pod's log,
affected recording rule
record: node:node_num_cpu:sum
record: cluster:cpu_core_node_labels
record: cluster:cpu_usage_cores:sum

# oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:35.304Z caller=manager.go:525 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-60-28.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-59-196.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:36.735Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_core_node_labels\nexpr: cluster:nodes_roles * on(node) group_right(label_beta_kubernetes_io_instance_type,\n  label_node_role_kubernetes_io, label_node_openshift_io_os_id, label_kubernetes_io_arch,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) label_replace(cluster:cpu_core_hyperthreading,\n  \"node\", \"$1\", \"instance\", \"(.*)\")\n" err="found duplicate series for the match group {node=\"ip-10-0-52-100.us-east-2.compute.internal\"} on the left hand-side of the operation: [{__name__=\"cluster:nodes_roles\", endpoint=\"https-main\", instance=\"10.131.0.21:8443\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m4.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-52-100.us-east-2.compute.internal\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhel\", namespace=\"openshift-monitoring\", node=\"ip-10-0-52-100.us-east-2.compute.internal\", pod=\"kube-state-metrics-75679bfbf5-vg9qv\", service=\"kube-state-metrics\"}, {__name__=\"cluster:nodes_roles\", endpoint=\"https-main\", instance=\"10.130.2.11:8443\", job=\"kube-state-metrics\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m4.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-52-100.us-east-2.compute.internal\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhel\", namespace=\"openshift-monitoring\", node=\"ip-10-0-52-100.us-east-2.compute.internal\", pod=\"kube-state-metrics-75679bfbf5-nt8jx\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-02-14T05:32:36.737Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_usage_cores:sum\nexpr: sum(1 - rate(node_cpu_seconds_total{mode=\"idle\"}[2m]) * on(namespace, pod) group_left(node)\n  node_namespace_pod:kube_pod_info:)\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-60-28.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"ip-10-0-59-196.us-east-2.compute.internal\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.2     True        False         128m    Cluster version is 4.3.2


How reproducible:
Sometimes

Steps to Reproduce:
1. oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 Junqi Zhao 2020-03-06 10:10:00 UTC
Tested with 4.5.0-0.ci-2020-03-04-223611, the changes are already in the payload, and did not see the errors from prometheus container

Comment 7 errata-xmlrpc 2020-07-13 17:15:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409