Bug 1834913 - "many-to-many matching not allowed: matching labels must be unique on one side" warn info in prometheus-k8s pod after upgrade
Summary: "many-to-many matching not allowed: matching labels must be unique on one sid...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.4
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.4.z
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1812006
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-12 16:17 UTC by Simon Pasquier
Modified: 2020-11-12 08:10 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1812006
Environment:
Last Closed: 2020-07-06 15:34:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
prometheus-k8s pod logs (85.71 KB, text/plain)
2020-06-30 07:32 UTC, Junqi Zhao
no flags Details
PrometheusRule crd file after upgrade to 4.4 (3.95 KB, text/plain)
2020-06-30 07:33 UTC, Junqi Zhao
no flags Details
prometheus-k8s-rules file for 4.4.0-0.nightly-2020-06-27-171816 (74.67 KB, text/plain)
2020-07-01 01:57 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 787 0 None closed Bug 1834913: bump kubernetes-monitoring/kubernetes-mixin 2020-11-12 08:10:46 UTC

Description Simon Pasquier 2020-05-12 16:17:25 UTC
+++ This bug was initially created as a clone of Bug #1812006 +++

Description of problem:
after upgrade from 4.3.5 to 4.4.0-0.nightly-2020-03-10-042427, there are "many-to-many matching not allowed: matching labels must be unique on one side" warn info in prometheus-k8s pod after upgrade, no such error before upgrade

here is a stastic for the affected rules
record rule							occurrence number
record: mixin_pod_workload					3
record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate	3
record: node_namespace_pod_container:container_memory_cache	3
record: node_namespace_pod_container:container_memory_rss	3
record: node_namespace_pod_container:container_memory_swap	3
record: node_namespace_pod_container:container_memory_working_set_bytes	3
record: node:node_num_cpu:sum						2
	
	
alert rule				occurrence number	
alert: KubePodNotReady			3
alert: PodDisruptionBudgetAtLimit	3
alert: PodDisruptionBudgetLimit		3

full logs see the attached file
# oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-03-10T07:47:10.625Z caller=manager.go:525 component="rule manager" group=kubernetes-apps msg="Evaluating rule failed" rule="alert: KubePodNotReady\nexpr: sum by(namespace, pod) (max by(namespace, pod) (kube_pod_status_phase{job=\"kube-state-metrics\",namespace=~\"(openshift-.*|kube-.*|default|logging)\",phase=~\"Pending|Unknown\"})\n  * on(namespace, pod) group_left(owner_kind) max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!=\"Job\"}))\n  > 0\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state\n    for longer than 15 minutes.\n" err="found duplicate series for the match group {namespace=\"openshift-kube-apiserver\", pod=\"kube-apiserver-zhsun-qqrhz-m-0.c.openshift-qe.internal\"} on the right hand-side of the operation: [{namespace=\"openshift-kube-apiserver\", owner_kind=\"Node\", pod=\"kube-apiserver-zhsun-qqrhz-m-0.c.openshift-qe.internal\"}, {namespace=\"openshift-kube-apiserver\", owner_kind=\"<none>\", pod=\"kube-apiserver-zhsun-qqrhz-m-0.c.openshift-qe.internal\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-03-10T07:47:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-machine-config-operator\", poddisruptionbudget=\"etcd-quorum-guard\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.0.37:8443\", job=\"kube-state-metrics\", namespace=\"openshift-machine-config-operator\", pod=\"kube-state-metrics-d6c94499b-9j4fx\", poddisruptionbudget=\"etcd-quorum-guard\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.128.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-machine-config-operator\", pod=\"kube-state-metrics-d6c94499b-fs6gs\", poddisruptionbudget=\"etcd-quorum-guard\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
...


Version-Release number of selected component (if applicable):
upgrade from 4.3.5 to 4.4.0-0.nightly-2020-03-10-042427

How reproducible:
sometimes

Steps to Reproduce:
1. oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Simon Pasquier on 2020-03-10 13:49:48 UTC ---

PodDisruptionBudgetAtLimit and PodDisruptionBudgetLimit alerts are tracked here: https://bugzilla.redhat.com/show_bug.cgi?id=1806640

--- Additional comment from Junqi Zhao on 2020-04-15 03:23:25 UTC ---

met the same issue after upgrading from 4.3.12 to 4.4.0-0.nightly-2020-04-13-113747

--- Additional comment from Simon Pasquier on 2020-05-05 09:15:06 UTC ---

deferred due to higher priority tasks.

--- Additional comment from errata-xmlrpc on 2020-05-12 14:59:22 UTC ---

This bug has been added to advisory RHBA-2020:51809 by OpenShift Release Team Bot (ocp-build/buildvm.openshift.eng.bos.redhat.com)

--- Additional comment from errata-xmlrpc on 2020-05-12 14:59:35 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2020:51809-02
https://errata.devel.redhat.com/advisory/51809

Comment 7 Junqi Zhao 2020-06-28 12:23:45 UTC
upgrade from 4.3.27 to 4.4.0-0.nightly-2020-06-27-171816, much better now, but still can find "record: node:node_num_cpu:sum" warn info
and "record: cluster:cpu_core_node_labels" which is tracked in bug 1851685
# oc -n openshift-monitoring logs prometheus-k8s-0 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-06-28T12:00:05.307Z caller=manager.go:525 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-user-workload-monitoring\", pod=\"prometheus-user-workload-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-user-workload-monitoring\", node=\"ip-10-0-149-64.us-east-2.compute.internal\", pod=\"prometheus-user-workload-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-user-workload-monitoring\", node=\"ip-10-0-131-193.us-east-2.compute.internal\", pod=\"prometheus-user-workload-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-06-28T12:00:06.738Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_core_node_labels\nexpr: cluster:nodes_roles * on(node) group_right(label_beta_kubernetes_io_instance_type,\n  label_node_role_kubernetes_io, label_node_openshift_io_os_id, label_kubernetes_io_arch,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) label_replace(cluster:cpu_core_hyperthreading,\n  \"node\", \"$1\", \"instance\", \"(.*)\")\n" err="found duplicate series for the match group {node=\"ip-10-0-143-19.us-east-2.compute.internal\"} on the left hand-side of the operation: [{__name__=\"cluster:nodes_roles\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m5.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-143-19\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhcos\", label_node_role_kubernetes_io=\"master\", label_node_role_kubernetes_io_master=\"true\", namespace=\"openshift-monitoring\", node=\"ip-10-0-143-19.us-east-2.compute.internal\"}, {__name__=\"cluster:nodes_roles\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"m5.xlarge\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"us-east-2\", label_failure_domain_beta_kubernetes_io_zone=\"us-east-2a\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"ip-10-0-143-19\", label_kubernetes_io_os=\"linux\", label_node_kubernetes_io_instance_type=\"m5.xlarge\", label_node_openshift_io_os_id=\"rhcos\", label_node_role_kubernetes_io=\"master\", label_node_role_kubernetes_io_master=\"true\", label_topology_kubernetes_io_region=\"us-east-2\", label_topology_kubernetes_io_zone=\"us-east-2a\", namespace=\"openshift-monitoring\", node=\"ip-10-0-143-19.us-east-2.compute.internal\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Comment 10 Junqi Zhao 2020-06-30 07:32:36 UTC
Created attachment 1699247 [details]
prometheus-k8s pod logs

Comment 11 Junqi Zhao 2020-06-30 07:33:16 UTC
Created attachment 1699248 [details]
PrometheusRule crd file after upgrade to 4.4

Comment 14 Junqi Zhao 2020-07-01 01:57:33 UTC
Created attachment 1699401 [details]
prometheus-k8s-rules file for 4.4.0-0.nightly-2020-06-27-171816


Note You need to log in before you can comment on or make changes to this bug.