Bug 1759469

Summary: [4.2.z] sometimes find "found duplicate series for the match group" error in prometheus-k8s pod logs
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Paul Gier <pgier>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: low    
Version: 4.2.zCC: alegrand, anpicker, chambrid, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-13 21:27:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
duplicate series for the match group
none
monitoring dump
none
4.3 monitoring dump none

Description Junqi Zhao 2019-10-08 09:18:25 UTC
Created attachment 1623447 [details]
duplicate series for the match group

Description of problem:
see the attached file, there are warn info like below, error is err="found duplicate series for the match group", but it seems it does not affect the function
level=warn ts=2019-10-08T03:42:36.753Z caller=manager.go:513 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: node_role_os_version_machine:cpu_capacity_cores:sum\nexpr: sum by(label_node_openshift_io_os_id, label_kubernetes_io_arch, label_node_role_kubernetes_io_master_infra,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) ((cluster:master_infra_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (cluster:master_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (cluster:infra_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (kube_node_labels\n  * on(node) group_left() kube_node_status_capacity_cpu_cores))\n" err="found duplicate series for the match group {node=\"qe-jiazha-42-7tks4-master-0\"} on the right hand-side of the operation: [{__name__=\"kube_node_status_capacity_cpu_cores\", endpoint=\"https-main\", instance=\"10.131.0.4:8443\", job=\"kube-state-metrics\", namespace=\"openshift-monitoring\", node=\"qe-jiazha-42-7tks4-master-0\", pod=\"kube-state-metrics-65d5886446-69dhf\", service=\"kube-state-metrics\"}, {__name__=\"kube_node_status_capacity_cpu_cores\", endpoint=\"https-main\", instance=\"10.131.0.23:8443\", job=\"kube-state-metrics\", namespace=\"openshift-monitoring\", node=\"qe-jiazha-42-7tks4-master-0\", pod=\"kube-state-metrics-65d5886446-csvxl\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-10-07-161806

How reproducible:
some times

Steps to Reproduce:
1. oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lili Cosic 2019-10-09 07:01:12 UTC
Can you have a look at this, seems like there is a duplicate match for the rule you added. Thanks!

Comment 2 Chris Hambridge 2019-10-09 12:07:16 UTC
Can you provide the results of the kube_node_role prometheus query? I'd like to understand what the node_role labels were on the cluster in order to reproduce.

Comment 3 Junqi Zhao 2019-10-10 07:03:11 UTC
(In reply to Chris Hambridge from comment #2)
> Can you provide the results of the kube_node_role prometheus query? I'd like
> to understand what the node_role labels were on the cluster in order to
> reproduce.

It is not reproduced every time, will provide info when we meet it next time

Comment 5 Junqi Zhao 2019-10-12 09:13:24 UTC
Created attachment 1624943 [details]
monitoring dump

Comment 7 Junqi Zhao 2019-10-16 06:29:52 UTC
Created attachment 1626290 [details]
4.3 monitoring dump

Comment 10 Junqi Zhao 2019-11-27 00:41:29 UTC
Tested with 4.3.0-0.nightly-2019-11-25-153929, did not meet this issue now

Comment 12 errata-xmlrpc 2020-05-13 21:27:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062