Bug 1851685

Summary: "many-to-many matching not allowed: matching labels must be unique on one side" warn info for "cluster:cpu_core_node_labels"
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.4CC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, spasquie, surbania
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:09:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2020-06-28 08:00:55 UTC
Description of problem:
upgrade from 4.3.26 to 4.4.9, found 
"many-to-many matching not allowed: matching labels must be unique on one side" warn info for "record: cluster:cpu_core_node_labels" and "record: node:node_num_cpu:sum" rules, issue for "record: node:node_num_cpu:sum" rule is tracked in bug 1834913, this BZ only track for "record: cluster:cpu_core_node_labels"

# oc -n openshift-monitoring logs prometheus-k8s-1 -c prometheus | grep "many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-06-28T07:42:05.328Z caller=manager.go:525 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"}\n  * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-monitoring\", pod=\"alertmanager-main-0\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"dyan-upg4326-4qvbt-worker-westus21-zf2ks\", pod=\"alertmanager-main-0\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-monitoring\", node=\"dyan-upg4326-4qvbt-worker-westus21-mx55p\", pod=\"alertmanager-main-0\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-06-28T07:42:06.822Z caller=manager.go:525 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: cluster:cpu_core_node_labels\nexpr: cluster:nodes_roles * on(node) group_right(label_beta_kubernetes_io_instance_type,\n  label_node_role_kubernetes_io, label_node_openshift_io_os_id, label_kubernetes_io_arch,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) label_replace(cluster:cpu_core_hyperthreading,\n  \"node\", \"$1\", \"instance\", \"(.*)\")\n" err="found duplicate series for the match group {node=\"dyan-upg4326-4qvbt-worker-westus21-mx55p\"} on the left hand-side of the operation: [{__name__=\"cluster:nodes_roles\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"Standard_D2s_v3\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"westus2\", label_failure_domain_beta_kubernetes_io_zone=\"westus2-1\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"dyan-upg4326-4qvbt-worker-westus21-mx55p\", label_kubernetes_io_os=\"linux\", label_node_openshift_io_os_id=\"rhcos\", namespace=\"openshift-monitoring\", node=\"dyan-upg4326-4qvbt-worker-westus21-mx55p\"}, {__name__=\"cluster:nodes_roles\", label_beta_kubernetes_io_arch=\"amd64\", label_beta_kubernetes_io_instance_type=\"Standard_D2s_v3\", label_beta_kubernetes_io_os=\"linux\", label_failure_domain_beta_kubernetes_io_region=\"westus2\", label_failure_domain_beta_kubernetes_io_zone=\"westus2-1\", label_kubernetes_io_arch=\"amd64\", label_kubernetes_io_hostname=\"dyan-upg4326-4qvbt-worker-westus21-mx55p\", label_kubernetes_io_os=\"linux\", label_node_kubernetes_io_instance_type=\"Standard_D2s_v3\", label_node_openshift_io_os_id=\"rhcos\", label_topology_kubernetes_io_region=\"westus2\", label_topology_kubernetes_io_zone=\"westus2-1\", namespace=\"openshift-monitoring\", node=\"dyan-upg4326-4qvbt-worker-westus21-mx55p\"}];many-to-many matching not allowed: matching labels must be unique on one side"


Version-Release number of selected component (if applicable):
upgrade from 4.3.26 to 4.4.9

How reproducible:
sometimes

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Junqi Zhao 2020-07-23 12:12:50 UTC
upgrade from 4.5.3 to 4.6.0-0.nightly-2020-07-23-055513, no such issue now
expr is changed to
**************
  - expr: |
      topk by(node) (1, cluster:nodes_roles) * on (node)
        group_right( label_beta_kubernetes_io_instance_type, label_node_role_kubernetes_io, label_node_openshift_io_os_id, label_kubernetes_io_arch,
                     label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra)
      label_replace( cluster:cpu_core_hyperthreading, "node", "$1", "instance", "(.*)" )
    record: cluster:cpu_core_node_labels
**************

Comment 10 errata-xmlrpc 2020-10-27 16:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196