Bug 1922053 - [4.6.z]"Evaluating rule failed" for "record: node:node_num_cpu:sum" rule
Summary: [4.6.z]"Evaluating rule failed" for "record: node:node_num_cpu:sum" rule
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Prem Saraswat
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1908655
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-29 06:18 UTC by Junqi Zhao
Modified: 2021-06-02 15:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1908655
Environment:
[sig-instrumentation] Prometheus when installed on the cluster shouldn't have failing rules evaluation
Last Closed: 2021-06-02 15:24:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
prometheus container logs (33.29 KB, text/plain)
2021-01-29 06:22 UTC, Junqi Zhao
no flags Details
openshift-user-workload-monitoring events (27.46 KB, text/plain)
2021-01-29 06:23 UTC, Junqi Zhao
no flags Details

Comment 1 Junqi Zhao 2021-01-29 06:21:37 UTC
upgrade from 4.5.30 to 4.6.0-0.nightly-2021-01-28-234643, meet the same issue
# oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
level=warn ts=2021-01-29T04:35:05.316Z caller=manager.go:598 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"} * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-user-workload-monitoring\", pod=\"prometheus-user-workload-1\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-user-workload-monitoring\", node=\"ip-10-0-193-105.us-east-2.compute.internal\", pod=\"prometheus-user-workload-1\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-user-workload-monitoring\", node=\"ip-10-0-177-99.us-east-2.compute.internal\", pod=\"prometheus-user-workload-1\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2021-01-29T04:35:35.309Z caller=manager.go:598 component="rule manager" group=node.rules msg="Evaluating rule failed" rule="record: node:node_num_cpu:sum\nexpr: count by(cluster, node) (sum by(node, cpu) (node_cpu_seconds_total{job=\"node-exporter\"} * on(namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:))\n" err="found duplicate series for the match group {namespace=\"openshift-user-workload-monitoring\", pod=\"prometheus-user-workload-1\"} on the right hand-side of the operation: [{__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-user-workload-monitoring\", node=\"ip-10-0-193-105.us-east-2.compute.internal\", pod=\"prometheus-user-workload-1\"}, {__name__=\"node_namespace_pod:kube_pod_info:\", namespace=\"openshift-user-workload-monitoring\", node=\"ip-10-0-177-99.us-east-2.compute.internal\", pod=\"prometheus-user-workload-1\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Comment 2 Junqi Zhao 2021-01-29 06:22:02 UTC
Created attachment 1751953 [details]
prometheus container logs

Comment 3 Junqi Zhao 2021-01-29 06:23:41 UTC
Created attachment 1751954 [details]
openshift-user-workload-monitoring events

prometheus-user-workload-1 pod changed from ip-10-0-193-105.us-east-2.compute.internal to ip-10-0-177-99.us-east-2.compute.internal

Comment 6 Prem Saraswat 2021-05-10 14:19:06 UTC
Since rule evaluation are instant queries, the error

found duplicate series for the match group {namespace="openshift-monitoring", pod="alertmanager-main-2"} on the right hand-side of the operation: [{__name__="node_namespace_pod:kube_pod_info:", namespace="openshift-monitoring", node="ip-10-0-70-152.us-east-2.compute.internal", pod="alertmanager-main-2"}, {__name__="node_namespace_pod:kube_pod_info:", namespace="openshift-monitoring", node="ip-10-0-67-241.us-east-2.compute.internal", pod="alertmanager-main-2"}]; many-to-many matching not allowed: matching labels must be unique on one side

is a bit weird as the pod "alertmanager-main-2" in namespace "openshift-monitoring" couldn't be possibly running on two nodes at once.

I am just guessing but one possible cause of this can be staleness behavior of Prometheus. If the rule evaluation happens just after a pod gets scheduled on a different node, Prometheus won't consider the old timeseries as stale yet and return both (with the new node label and with the old one) timeseries. This will cause the above error.

I doubt this has anything specific to do with user-workload monitoring though.

A quick summary would be, if a pod is scheduled on a different node, until the old timeseries gets marked as stale, the rule "record: node:node_num_cpu:sum" will fail.

Comment 7 Simon Pasquier 2021-05-10 15:06:13 UTC
@Prem your explanation is correct and it's been already fixed in the CMO master branch (see https://github.com/openshift/cluster-monitoring-operator/pull/1044 and https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/553). This bug is a clone of bug 1908655 created by Junqi, meaning that he wanted to backport it to 4.7 potentially. Given the bug's severity, I'm not sure that it deserves a backport though?

Junqi, do you feel strongly that it needs to be backported? If not we should close this bug.

Comment 8 Junqi Zhao 2021-05-11 02:44:26 UTC
(In reply to Simon Pasquier from comment #7)
> @Prem your explanation is correct and it's been already fixed in the CMO
> master branch (see
> https://github.com/openshift/cluster-monitoring-operator/pull/1044 and
> https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/553). This
> bug is a clone of bug 1908655 created by Junqi, meaning that he wanted to
> backport it to 4.7 potentially. Given the bug's severity, I'm not sure that
> it deserves a backport though?
> 
> Junqi, do you feel strongly that it needs to be backported? If not we should
> close this bug.

from Comment 0, the warning info is also shown for prometheus-k8s-0, so I think we should backport to 4.6

Comment 9 Junqi Zhao 2021-05-11 02:52:23 UTC
(In reply to Junqi Zhao from comment #8)
> from Comment 0, the warning info is also shown for prometheus-k8s-0, so I
> think we should backport to 4.6

also shown for alertmanager-main-2 pod

Comment 10 Simon Pasquier 2021-05-17 14:35:57 UTC
The warnings are only transient (e.g. during at most 5 minutes) and the situation goes back to normal after that. I'd be in favor of not backporting the fix knowing that 4.6 is in maintenance mode since March 24, 2021.

Comment 11 Prem Saraswat 2021-05-20 09:55:10 UTC
I have limited context about when to backport and when not to, but I also agree with Simon that given it's transient nature and rare reproducibility, we can skip backporting this to 4.6.

Comment 12 Prem Saraswat 2021-06-02 15:24:50 UTC
As discussed in the last sprint review, I am going ahead and closing this.


Note You need to log in before you can comment on or make changes to this bug.