Bug 1834232

Summary: [4.5] "many-to-many matching not allowed" for ClusterIPTablesStale rule
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low CC: alegrand, anpicker, cdc, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:37:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2020-05-11 10:52:23 UTC
Description of problem:
upgrade from 4.4.3 to 4.5.0-0.nightly-2020-05-11-011730, there are "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus container,
here is a stastic for the affected rules
alert rule				occurrence number	
alert: ClusterIPTablesStale		3
alert: PodDisruptionBudgetAtLimit	3
alert: PodDisruptionBudgetLimit		3

PodDisruptionBudgetAtLimit/PodDisruptionBudgetLimit are reported in Bug 1812006, this bug is only for ClusterIPTablesStale
*****************************************
alert: ClusterIPTablesStale
expr: quantile(0.95,
  timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds
  * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"})
  > 90
for: 20m
labels:
  severity: warning
annotations:
  message: The average time between iptables resyncs is too high. NOTE - There is
    some scrape delay and other offsets, 90s isn't exact but it is still too high.
*****************************************

# oc -n openshift-monitoring logs -c prometheus prometheus-k8s-0
level=warn ts=2020-05-11T09:32:44.842Z caller=manager.go:525 component="rule manager" group=general.rules msg="Evaluating rule failed" rule="alert: ClusterIPTablesStale\nexpr: quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)\n  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right()\n  kube_pod_info{namespace=\"openshift-sdn\",pod=~\"sdn-[^-]*\"}) > 90\nfor: 20m\nlabels:\n  severity: warning\nannotations:\n  message: The average time between iptables resyncs is too high. NOTE - There is\n    some scrape delay and other offsets, 90s isn't exact but it is still too high.\n" err="found duplicate series for the match group {pod=\"sdn-lz98f\"} on the right hand-side of the operation: [{created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}, {created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:32:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:32:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:14.841Z caller=manager.go:525 component="rule manager" group=general.rules msg="Evaluating rule failed" rule="alert: ClusterIPTablesStale\nexpr: quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)\n  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right()\n  kube_pod_info{namespace=\"openshift-sdn\",pod=~\"sdn-[^-]*\"}) > 90\nfor: 20m\nlabels:\n  severity: warning\nannotations:\n  message: The average time between iptables resyncs is too high. NOTE - There is\n    some scrape delay and other offsets, 90s isn't exact but it is still too high.\n" err="found duplicate series for the match group {pod=\"sdn-lz98f\"} on the right hand-side of the operation: [{created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}, {created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:16.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:44.841Z caller=manager.go:525 component="rule manager" group=general.rules msg="Evaluating rule failed" rule="alert: ClusterIPTablesStale\nexpr: quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)\n  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right()\n  kube_pod_info{namespace=\"openshift-sdn\",pod=~\"sdn-[^-]*\"}) > 90\nfor: 20m\nlabels:\n  severity: warning\nannotations:\n  message: The average time between iptables resyncs is too high. NOTE - There is\n    some scrape delay and other offsets, 90s isn't exact but it is still too high.\n" err="found duplicate series for the match group {pod=\"sdn-lz98f\"} on the right hand-side of the operation: [{created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}, {created_by_kind=\"DaemonSet\", created_by_name=\"sdn\", endpoint=\"https-main\", host_ip=\"10.0.0.6\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-sdn\", node=\"qe-jiazha3-up-05110345-master-2\", pod=\"sdn-lz98f\", pod_ip=\"10.0.0.6\", priority_class=\"system-node-critical\", service=\"kube-state-metrics\", uid=\"a3f33925-acf3-4302-b257-717749c4ec14\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetAtLimit\nexpr: kube_poddisruptionbudget_status_expected_pods == on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: warning\nannotations:\n  message: The pod disruption budget is preventing further disruption to pods because\n    it is at the minimum allowed level.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"
level=warn ts=2020-05-11T09:33:46.084Z caller=manager.go:525 component="rule manager" group=cluster-version msg="Evaluating rule failed" rule="alert: PodDisruptionBudgetLimit\nexpr: kube_poddisruptionbudget_status_expected_pods < on(namespace, poddisruptionbudget,\n  service) kube_poddisruptionbudget_status_desired_healthy\nfor: 15m\nlabels:\n  severity: critical\nannotations:\n  message: The pod disruption budget is below the minimum number allowed pods.\n" err="found duplicate series for the match group {namespace=\"openshift-ingress\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"} on the right hand-side of the operation: [{__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.131.2.10:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-4cwhr\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}, {__name__=\"kube_poddisruptionbudget_status_desired_healthy\", endpoint=\"https-main\", instance=\"10.129.2.9:8443\", job=\"kube-state-metrics\", namespace=\"openshift-ingress\", pod=\"kube-state-metrics-d987997f7-xcl2k\", poddisruptionbudget=\"router-default\", service=\"kube-state-metrics\"}];many-to-many matching not allowed: matching labels must be unique on one side"

Version-Release number of selected component (if applicable):
upgrade from 4.4.3 to 4.5.0-0.nightly-2020-05-11-011730

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:
there are "many-to-many matching not allowed: matching labels must be unique on one side" in prometheus container

Expected results:
no error

Additional info:

Comment 7 zhaozhanqi 2020-05-20 08:08:55 UTC
Verified this bug on 4.5.0-0.nightly-2020-05-19-041951

no this error logs found 
oc logs prometheus-k8s-0 -n openshift-monitoring -c prometheus | grep ClusterIPTablesStale

Comment 8 errata-xmlrpc 2020-07-13 17:37:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409