Bug 1832272 - ClusterIPTablesStale/NodeIPTablesStale alerts fired in fresh cluster
Summary: ClusterIPTablesStale/NodeIPTablesStale alerts fired in fresh cluster
Keywords:
Status: CLOSED DUPLICATE of bug 1826339
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-06 12:44 UTC by Junqi Zhao
Modified: 2020-05-06 16:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-06 16:31:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openshift-sdn pods' log (349.67 KB, application/gzip)
2020-05-06 12:44 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2020-05-06 12:44:06 UTC
Created attachment 1685701 [details]
openshift-sdn pods' log

Description of problem:
fresh cluster, found ClusterIPTablesStale/NodeIPTablesStale alerts triggered
ALERTS{alertname=~"ClusterIPTablesStale|NodeIPTablesStale"}
Element 	Value
ALERTS{alertname="ClusterIPTablesStale",alertstate="firing",severity="warning"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-1",pod="sdn-7hbq5",pod_ip="10.0.0.6",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="c23d7c15-bcd0-4d3c-a94b-d9c8b5acc971"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.7",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-2",pod="sdn-62xcj",pod_ip="10.0.0.7",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="a334b395-3d09-4c9e-8817-c896d9ba25e7"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.8",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-0",pod="sdn-lst42",pod_ip="10.0.0.8",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="d02c2a73-22e3-4824-8acf-f347eeb30813"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.10",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-0",pod="sdn-65n48",pod_ip="10.0.1.10",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="dfbf961d-9bd0-40e7-9502-cfdc2997be41"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.4",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-1",pod="sdn-7qh4n",pod_ip="10.0.1.4",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="7e4eadab-f08e-492c-a3b0-f8712f6f4fac"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.5",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-3",pod="sdn-zlnxd",pod_ip="10.0.1.5",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="450d8477-badb-45ed-a981-bd03c175ad8a"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-2",pod="sdn-zkn9s",pod_ip="10.0.1.6",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="a1844950-5b90-40f5-8803-9014b357a061"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.9",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-1",pod="sdn-pnphs",pod_ip="10.0.1.9",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="cdae257b-a2cd-4758-ae68-316ce66477fc"}	1

alert details
********************
alert: NodeIPTablesStale
expr: (timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)
  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(pod) group_right()
  kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"} > 120
for: 20m
labels:
  severity: warning
annotations:
  message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} has gone too long
    without syncing iptables rules.

alert: ClusterIPTablesStale
expr: quantile(0.95,
  timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds
  * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"})
  > 90
for: 20m
labels:
  severity: warning
annotations:
  message: The average time between iptables resyncs is too high. NOTE - There is
    some scrape delay and other offsets, 90s isn't exact but it is still too high.
********************

query the expr in prometheus
NodeIPTablesStale expr:
(timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"} > 120
result:
Element 	Value
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-1",pod="sdn-7hbq5",pod_ip="10.0.0.6",priority_class="system-node-critical",service="kube-state-metrics",uid="c23d7c15-bcd0-4d3c-a94b-d9c8b5acc971"}	2278.20618224144
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.7",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-2",pod="sdn-62xcj",pod_ip="10.0.0.7",priority_class="system-node-critical",service="kube-state-metrics",uid="a334b395-3d09-4c9e-8817-c896d9ba25e7"}	2282.0367472171783
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.8",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-0",pod="sdn-lst42",pod_ip="10.0.0.8",priority_class="system-node-critical",service="kube-state-metrics",uid="d02c2a73-22e3-4824-8acf-f347eeb30813"}	2291.8284714221954
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.10",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-0",pod="sdn-65n48",pod_ip="10.0.1.10",priority_class="system-node-critical",service="kube-state-metrics",uid="dfbf961d-9bd0-40e7-9502-cfdc2997be41"}	2271.0348105430603
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.4",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-1",pod="sdn-7qh4n",pod_ip="10.0.1.4",priority_class="system-node-critical",service="kube-state-metrics",uid="7e4eadab-f08e-492c-a3b0-f8712f6f4fac"}	2287.8972160816193
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.5",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-3",pod="sdn-zlnxd",pod_ip="10.0.1.5",priority_class="system-node-critical",service="kube-state-metrics",uid="450d8477-badb-45ed-a981-bd03c175ad8a"}	2274.371784210205
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-2",pod="sdn-zkn9s",pod_ip="10.0.1.6",priority_class="system-node-critical",service="kube-state-metrics",uid="a1844950-5b90-40f5-8803-9014b357a061"}	2283.6773200035095
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.9",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-1",pod="sdn-pnphs",pod_ip="10.0.1.9",priority_class="system-node-critical",service="kube-state-metrics",uid="cdae257b-a2cd-4758-ae68-316ce66477fc"}	2282.943865060

ClusterIPTablesStale expr:
quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"}) > 90
result:
Element 	Value
{}	1697.8125918507576

Version-Release number of selected component (if applicable):
UPI on Azure 4.5.0-0.nightly-2020-05-05-205255 cluster

How reproducible:
always

Steps to Reproduce:
1. check prometheus alerts in prometheus
2.
3.

Actual results:
ClusterIPTablesStale/NodeIPTablesStale alerts triggered

Expected results:
no such alerts

Additional info:

Comment 1 Casey Callendrello 2020-05-06 16:31:26 UTC

*** This bug has been marked as a duplicate of bug 1826339 ***


Note You need to log in before you can comment on or make changes to this bug.