Bug 1834232

Summary:	[4.5] "many-to-many matching not allowed" for ClusterIPTablesStale rule
Product:	OpenShift Container Platform	Reporter:	Junqi Zhao <juzhao>
Component:	Networking	Assignee:	Casey Callendrello <cdc>
Networking sub component:	openshift-sdn	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	low
Priority:	low	CC:	alegrand, anpicker, cdc, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Version:	4.5
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-13 17:37:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Junqi Zhao 2020-05-11 10:52:23 UTC

Description of problem:
upgrade from 4.4.3 to 4.5.0-0.nightly-2020-05-11-011730, there are "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus container,
here is a stastic for the affected rules
alert rule				occurrence number	
alert: ClusterIPTablesStale		3
alert: PodDisruptionBudgetAtLimit	3
alert: PodDisruptionBudgetLimit		3

PodDisruptionBudgetAtLimit/PodDisruptionBudgetLimit are reported in Bug 1812006, this bug is only for ClusterIPTablesStale
*****************************************
alert: ClusterIPTablesStale
expr: quantile(0.95,
  timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds
  * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"})
  > 90
for: 20m
labels:
  severity: warning
annotations:
  message: The average time between iptables resyncs is too high. NOTE - There is
    some scrape delay and other offsets, 90s isn't exact but it is still too high.
*****************************************

# oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
level=warn ts=2020-05-11T09:32:44.842Z caller=manager.go:525 component="rule manager" group=general.rules msg="Evaluating rule failed" rule="alert: ClusterIPTablesStale\nexpr: quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)\n  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right()\n  kube_pod_info{namespace=\"openshift-sdn\",pod=~\"sdn-[^-]*\"}) > 90\nfor: 20m\nlabels:\n  severity: warning\nannotations:\n  message: The average time between iptables resyncs is too high. NOTE - There is\n    some scrape delay and other offsets, 90s isn't exact but it is still too high.\n" err="found duplicate series for the match group {pod=\"sdn-lz98f\"} on the right hand-side of the operation: [{created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}, {created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:32:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:32:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:14.841Z caller=manager.go:525 component="rule manager" group=general.rules msg="Evaluating rule failed" rule="alert: ClusterIPTablesStale\nexpr: quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)\n  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right()\n  kube_pod_info{namespace=\"openshift-sdn\",pod=~\"sdn-[^-]*\"}) > 90\nfor: 20m\nlabels:\n  severity: warning\nannotations:\n  message: The average time between iptables resyncs is too high. NOTE - There is\n    some scrape delay and other offsets, 90s isn't exact but it is still too high.\n" err="found duplicate series for the match group {pod=\"sdn-lz98f\"} on the right hand-side of the operation: [{created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}, {created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:44.841Z caller=manager.go:525 component="rule manager" group=general.rules msg="Evaluating rule failed" rule="alert: ClusterIPTablesStale\nexpr: quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)\n  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right()\n  kube_pod_info{namespace=\"openshift-sdn\",pod=~\"sdn-[^-]*\"}) > 90\nfor: 20m\nlabels:\n  severity: warning\nannotations:\n  message: The average time between iptables resyncs is too high. NOTE - There is\n    some scrape delay and other offsets, 90s isn't exact but it is still too high.\n" err="found duplicate series for the match group {pod=\"sdn-lz98f\"} on the right hand-side of the operation: [{created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}, {created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
upgrade from 4.4.3 to 4.5.0-0.nightly-2020-05-11-011730

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
there are "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus container

Expected results:
no error

Additional info:

Comment 7 zhaozhanqi 2020-05-20 08:08:55 UTC

Verified this bug on 4.5.0-0.nightly-2020-05-19-041951

no this error logs found 
oc logs prometheus-k8s-0 -n openshift-monitoring -c prometheus | grep ClusterIPTablesStale

Comment 8 errata-xmlrpc 2020-07-13 17:37:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409