Bug 1759469 - [4.2.z] sometimes find "found duplicate series for the match group" error in prometheus-k8s pod logs
Summary: [4.2.z] sometimes find "found duplicate series for the match group" error in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.3.0
Assignee: Paul Gier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-08 09:18 UTC by Junqi Zhao
Modified: 2020-05-13 21:27 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-13 21:27:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
duplicate series for the match group (65.97 KB, text/plain)
2019-10-08 09:18 UTC, Junqi Zhao
no flags Details
monitoring dump (625.26 KB, application/gzip)
2019-10-12 09:13 UTC, Junqi Zhao
no flags Details
4.3 monitoring dump (617.86 KB, application/gzip)
2019-10-16 06:29 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 544 0 'None' closed Bug 1771389: Refactor cpu recording rules 2021-02-09 09:11:56 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-05-13 21:27:14 UTC

Internal Links: 1771389

Description Junqi Zhao 2019-10-08 09:18:25 UTC
Created attachment 1623447 [details]
duplicate series for the match group

Description of problem:
see the attached file, there are warn info like below, error is err="found duplicate series for the match group", but it seems it does not affect the function
level=warn ts=2019-10-08T03:42:36.753Z caller=manager.go:513 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: node_role_os_version_machine:cpu_capacity_cores:sum\nexpr: sum by(label_node_openshift_io_os_id, label_kubernetes_io_arch, label_node_role_kubernetes_io_master_infra,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) ((cluster:master_infra_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (cluster:master_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (cluster:infra_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (kube_node_labels\n  * on(node) group_left() kube_node_status_capacity_cpu_cores))\n" err="found duplicate series for the match group {node=\"qe-jiazha-42-7tks4-master-0\"} on the right hand-side of the operation: [{__name__=\"kube_node_status_capacity_cpu_cores\", endpoint=\"https-main\", instance=\"10.131.0.4:8443\", job=\"kube-state-metrics\", namespace=\"openshift-monitoring\", node=\"qe-jiazha-42-7tks4-master-0\", pod=\"kube-state-metrics-65d5886446-69dhf\", service=\"kube-state-metrics\"}, {__name__=\"kube_node_status_capacity_cpu_cores\", endpoint=\"https-main\", instance=\"10.131.0.23:8443\", job=\"kube-state-metrics\", namespace=\"openshift-monitoring\", node=\"qe-jiazha-42-7tks4-master-0\", pod=\"kube-state-metrics-65d5886446-csvxl\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-10-07-161806

How reproducible:
some times

Steps to Reproduce:
1. oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lili Cosic 2019-10-09 07:01:12 UTC
Can you have a look at this, seems like there is a duplicate match for the rule you added. Thanks!

Comment 2 Chris Hambridge 2019-10-09 12:07:16 UTC
Can you provide the results of the kube_node_role prometheus query? I'd like to understand what the node_role labels were on the cluster in order to reproduce.

Comment 3 Junqi Zhao 2019-10-10 07:03:11 UTC
(In reply to Chris Hambridge from comment #2)
> Can you provide the results of the kube_node_role prometheus query? I'd like
> to understand what the node_role labels were on the cluster in order to
> reproduce.

It is not reproduced every time, will provide info when we meet it next time

Comment 5 Junqi Zhao 2019-10-12 09:13:24 UTC
Created attachment 1624943 [details]
monitoring dump

Comment 7 Junqi Zhao 2019-10-16 06:29:52 UTC
Created attachment 1626290 [details]
4.3 monitoring dump

Comment 10 Junqi Zhao 2019-11-27 00:41:29 UTC
Tested with 4.3.0-0.nightly-2019-11-25-153929, did not meet this issue now

Comment 12 errata-xmlrpc 2020-05-13 21:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.