Bug 1759469

Summary:

[4.2.z] sometimes find "found duplicate series for the match group" error in prometheus-k8s pod logs

Product:

OpenShift Container Platform

Reporter:

Junqi Zhao <juzhao>

Component:

Monitoring

Assignee:

Paul Gier <pgier>

Status:

CLOSED ERRATA

QA Contact:

Junqi Zhao <juzhao>

Severity:

low

Docs Contact:

Priority:

low

Version:

4.2.z

CC:

alegrand, anpicker, chambrid, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania

Target Milestone:

---

Target Release:

4.3.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-05-13 21:27:12 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
duplicate series for the match group	none
monitoring dump	none
4.3 monitoring dump	none

Description Junqi Zhao 2019-10-08 09:18:25 UTC

Created attachment 1623447 [details]
duplicate series for the match group

Description of problem:
see the attached file, there are warn info like below, error is err="found duplicate series for the match group", but it seems it does not affect the function
level=warn ts=2019-10-08T03:42:36.753Z caller=manager.go:513 component="rule manager" group=kubernetes.rules msg="Evaluating rule failed" rule="record: node_role_os_version_machine:cpu_capacity_cores:sum\nexpr: sum by(label_node_openshift_io_os_id, label_kubernetes_io_arch, label_node_role_kubernetes_io_master_infra,\n  label_node_role_kubernetes_io_master, label_node_role_kubernetes_io_infra) ((cluster:master_infra_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (cluster:master_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (cluster:infra_nodes\n  * on(node) group_left() kube_node_status_capacity_cpu_cores) or on(node) (kube_node_labels\n  * on(node) group_left() kube_node_status_capacity_cpu_cores))\n" err="found duplicate series for the match group {node=\"qe-jiazha-42-7tks4-master-0\"} on the right hand-side of the operation: [{__name__=\"kube_node_status_capacity_cpu_cores\", endpoint=\"https-main\", instance=\"10.131.0.4:8443\", job=\"kube-state-metrics\", namespace=\"openshift-monitoring\", node=\"qe-jiazha-42-7tks4-master-0\", pod=\"kube-state-metrics-65d5886446-69dhf\", service=\"kube-state-metrics\"}, {__name__=\"kube_node_status_capacity_cpu_cores\", endpoint=\"https-main\", instance=\"10.131.0.23:8443\", job=\"kube-state-metrics\", namespace=\"openshift-monitoring\", node=\"qe-jiazha-42-7tks4-master-0\", pod=\"kube-state-metrics-65d5886446-csvxl\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-10-07-161806

How reproducible:
some times

Steps to Reproduce:
1. oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lili Cosic 2019-10-09 07:01:12 UTC

Can you have a look at this, seems like there is a duplicate match for the rule you added. Thanks!

Comment 2 Chris Hambridge 2019-10-09 12:07:16 UTC

Can you provide the results of the kube_node_role prometheus query? I'd like to understand what the node_role labels were on the cluster in order to reproduce.

Comment 3 Junqi Zhao 2019-10-10 07:03:11 UTC

(In reply to Chris Hambridge from comment #2)
> Can you provide the results of the kube_node_role prometheus query? I'd like
> to understand what the node_role labels were on the cluster in order to
> reproduce.

It is not reproduced every time, will provide info when we meet it next time

Comment 5 Junqi Zhao 2019-10-12 09:13:24 UTC

Created attachment 1624943 [details]
monitoring dump

Comment 7 Junqi Zhao 2019-10-16 06:29:52 UTC

Created attachment 1626290 [details]
4.3 monitoring dump

Comment 10 Junqi Zhao 2019-11-27 00:41:29 UTC

Tested with 4.3.0-0.nightly-2019-11-25-153929, did not meet this issue now

Comment 12 errata-xmlrpc 2020-05-13 21:27:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062