Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1832272

Summary: ClusterIPTablesStale/NodeIPTablesStale alerts fired in fresh cluster
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: cdc
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-06 16:31:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
openshift-sdn pods' log none

Description Junqi Zhao 2020-05-06 12:44:06 UTC
Created attachment 1685701 [details]
openshift-sdn pods' log

Description of problem:
fresh cluster, found ClusterIPTablesStale/NodeIPTablesStale alerts triggered
ALERTS{alertname=~"ClusterIPTablesStale|NodeIPTablesStale"}
Element 	Value
ALERTS{alertname="ClusterIPTablesStale",alertstate="firing",severity="warning"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-1",pod="sdn-7hbq5",pod_ip="10.0.0.6",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="c23d7c15-bcd0-4d3c-a94b-d9c8b5acc971"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.7",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-2",pod="sdn-62xcj",pod_ip="10.0.0.7",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="a334b395-3d09-4c9e-8817-c896d9ba25e7"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.8",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-0",pod="sdn-lst42",pod_ip="10.0.0.8",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="d02c2a73-22e3-4824-8acf-f347eeb30813"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.10",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-0",pod="sdn-65n48",pod_ip="10.0.1.10",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="dfbf961d-9bd0-40e7-9502-cfdc2997be41"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.4",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-1",pod="sdn-7qh4n",pod_ip="10.0.1.4",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="7e4eadab-f08e-492c-a3b0-f8712f6f4fac"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.5",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-3",pod="sdn-zlnxd",pod_ip="10.0.1.5",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="450d8477-badb-45ed-a981-bd03c175ad8a"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-2",pod="sdn-zkn9s",pod_ip="10.0.1.6",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="a1844950-5b90-40f5-8803-9014b357a061"}	1
ALERTS{alertname="NodeIPTablesStale",alertstate="firing",created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.9",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-1",pod="sdn-pnphs",pod_ip="10.0.1.9",priority_class="system-node-critical",service="kube-state-metrics",severity="warning",uid="cdae257b-a2cd-4758-ae68-316ce66477fc"}	1

alert details
********************
alert: NodeIPTablesStale
expr: (timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds)
  - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(pod) group_right()
  kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"} > 120
for: 20m
labels:
  severity: warning
annotations:
  message: SDN pod {{ $labels.pod }} on node {{ $labels.node }} has gone too long
    without syncing iptables rules.

alert: ClusterIPTablesStale
expr: quantile(0.95,
  timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds
  * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"})
  > 90
for: 20m
labels:
  severity: warning
annotations:
  message: The average time between iptables resyncs is too high. NOTE - There is
    some scrape delay and other offsets, 90s isn't exact but it is still too high.
********************

query the expr in prometheus
NodeIPTablesStale expr:
(timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds) * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"} > 120
result:
Element 	Value
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-1",pod="sdn-7hbq5",pod_ip="10.0.0.6",priority_class="system-node-critical",service="kube-state-metrics",uid="c23d7c15-bcd0-4d3c-a94b-d9c8b5acc971"}	2278.20618224144
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.7",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-2",pod="sdn-62xcj",pod_ip="10.0.0.7",priority_class="system-node-critical",service="kube-state-metrics",uid="a334b395-3d09-4c9e-8817-c896d9ba25e7"}	2282.0367472171783
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.0.8",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-master-0",pod="sdn-lst42",pod_ip="10.0.0.8",priority_class="system-node-critical",service="kube-state-metrics",uid="d02c2a73-22e3-4824-8acf-f347eeb30813"}	2291.8284714221954
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.10",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-0",pod="sdn-65n48",pod_ip="10.0.1.10",priority_class="system-node-critical",service="kube-state-metrics",uid="dfbf961d-9bd0-40e7-9502-cfdc2997be41"}	2271.0348105430603
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.4",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-1",pod="sdn-7qh4n",pod_ip="10.0.1.4",priority_class="system-node-critical",service="kube-state-metrics",uid="7e4eadab-f08e-492c-a3b0-f8712f6f4fac"}	2287.8972160816193
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.5",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-3",pod="sdn-zlnxd",pod_ip="10.0.1.5",priority_class="system-node-critical",service="kube-state-metrics",uid="450d8477-badb-45ed-a981-bd03c175ad8a"}	2274.371784210205
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.6",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-worker-centralus-2",pod="sdn-zkn9s",pod_ip="10.0.1.6",priority_class="system-node-critical",service="kube-state-metrics",uid="a1844950-5b90-40f5-8803-9014b357a061"}	2283.6773200035095
{created_by_kind="DaemonSet",created_by_name="sdn",endpoint="https-main",host_ip="10.0.1.9",instance="10.129.2.6:8443",job="kube-state-metrics",namespace="openshift-sdn",node="yinzhou-share-05060154-rhelxy-1",pod="sdn-pnphs",pod_ip="10.0.1.9",priority_class="system-node-critical",service="kube-state-metrics",uid="cdae257b-a2cd-4758-ae68-316ce66477fc"}	2282.943865060

ClusterIPTablesStale expr:
quantile(0.95, timestamp(kubeproxy_sync_proxy_rules_last_timestamp_seconds) - on(pod) kubeproxy_sync_proxy_rules_last_timestamp_seconds * on(pod) group_right() kube_pod_info{namespace="openshift-sdn",pod=~"sdn-[^-]*"}) > 90
result:
Element 	Value
{}	1697.8125918507576

Version-Release number of selected component (if applicable):
UPI on Azure 4.5.0-0.nightly-2020-05-05-205255 cluster

How reproducible:
always

Steps to Reproduce:
1. check prometheus alerts in prometheus
2.
3.

Actual results:
ClusterIPTablesStale/NodeIPTablesStale alerts triggered

Expected results:
no such alerts

Additional info:

Comment 1 Casey Callendrello 2020-05-06 16:31:26 UTC

*** This bug has been marked as a duplicate of bug 1826339 ***