1778314 – False Alarm - SDN pod has gone too long without syncing iptables rules

Bug 1778314 - False Alarm - SDN pod has gone too long without syncing iptables rules

Summary: False Alarm - SDN pod has gone too long without syncing iptables rules

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.2.z
Assignee:	Jacob Tanenbaum
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:	1797033 1797041
Blocks:
TreeView+	depends on / blocked

Reported:	2019-11-29 21:15 UTC by Hugo Cisneiros (Eitch)
Modified:	2023-03-24 16:15 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1797033 (view as bug list)
Environment:
Last Closed:	2020-02-24 16:52:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 416	0	None	closed	Bug 1778314: False Alarm - SDN pod has gone too long without syncing iptables rule	2021-01-15 02:49:56 UTC
Red Hat Product Errata	RHBA-2020:0460	0	None	None	None	2020-02-24 16:52:59 UTC

Description Hugo Cisneiros (Eitch) 2019-11-29 21:15:13 UTC

Description of problem:

Cluster dashboard has these alarms:

* SDN pod sdn-9d4hc on node etcd-2.example.com has gone too long without syncing iptables rules. NOTE - There is some scrape delay and other offsets, 120s isn't exact but it is still too high.
* SDN pod sdn-sgrgk on node etcd-0.example.com has gone too long without syncing iptables rules. NOTE - There is some scrape delay and other offsets, 120s isn't exact but it is still too high.

While looking at the must-gather logs, I didn't see any problems on these pods sync processes:

2019-11-28T18:47:27.446545877Z I1128 18:47:27.446460 59750 proxy.go:331] hybrid proxy: syncProxyRules start
2019-11-28T18:47:27.597400981Z I1128 18:47:27.597347 59750 proxy.go:334] hybrid proxy: mainProxy.syncProxyRules complete
2019-11-28T18:47:27.653302796Z I1128 18:47:27.653260 59750 proxier.go:367] userspace proxy: processing 0 service events
2019-11-28T18:47:27.653409624Z I1128 18:47:27.653389 59750 proxier.go:346] userspace syncProxyRules took 55.912068ms
2019-11-28T18:47:27.653445515Z I1128 18:47:27.653436 59750 proxy.go:337] hybrid proxy: unidlingProxy.syncProxyRules complete

These are happening every 30 seconds, and the most time it took to complete was ~240ms. Very far from 120s.

Not sure why these are alarming.

Version-Release number of selected component (if applicable):

4.2.0

How reproducible:

* The alarms are on the web console dashboard;
* Log files at namespaces/openshift-sdn/sdn*/sdn/sdn/logs/current.log

Actual results:

Alarming in the dashboard.

Expected results:

No alarms.

Additional info:

Comment 4 Daniel Del Ciancio 2020-01-31 17:12:24 UTC

Need a status update - Is this targeted for 4.2.z or 4.3?  Or is a fix available?

Comment 10 errata-xmlrpc 2020-02-24 16:52:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0460

Note You need to log in before you can comment on or make changes to this bug.