1569128 – [starter-us-east-1] iptables running > 20 minutes with 100% CPU

Bug 1569128 - [starter-us-east-1] iptables running > 20 minutes with 100% CPU

Summary: [starter-us-east-1] iptables running > 20 minutes with 100% CPU

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Dan Williams
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-18 16:17 UTC by Justin Pierce
Modified:	2019-01-03 22:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-03 22:33:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Justin Pierce 2018-04-18 16:17:08 UTC

Description of problem:
SRE was notified by a zabbix alert of a performance issues on a starter cluster compute node: starter-us-east-1-node-compute-f1526 . Watching htop, iptables was consuming 100% of a core for more than 20 minutes (same PID). 

Version-Release number of selected component (if applicable):
v3.9.14
kernel: Linux ip-172-31-53-238.ec2.internal 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Seems to be a rolling problem in the starter clusters.

Steps to Reproduce:
1. Unknown

Additional info:
- After 20 minutes, the iptables process appeared to exit and htop returned to normal. 
- For unknown reasons, it was taking about 10 times as long to ssh to this box (~30 seconds) as it was to other nodes in the cluster.
- No excessive CPU/Disk/Memory pressure otherwise noted (iptables was taking a full core, but plenty of other cores were free; Load average: 33.85 31.63 30.86).

Comment 4 Dan Williams 2018-04-25 16:11:53 UTC

If this happens again, can somebody 'debuginfo-install iptables' and then 'gdb attach <pid>' and "backtrace" so we can see what's actually going on?

Comment 9 Ben Bennett 2018-05-16 14:15:13 UTC

Can we add a cron job to check to see if iptables has been running for more than a minute and then attach to the pid and generate a backtrace and email it out?

Comment 10 Justin Pierce 2018-05-25 20:56:18 UTC

I'd be glad to install it if someone can write it to dump to a file system location. I'll monitor for dumps and email if one is generated.

Comment 11 Dan Williams 2018-08-15 20:01:46 UTC

I believe you can do:

- debuginfo-install iptables
- gdb attach `pidof iptables`
- set logging file /tmp/gdb.txt
- set logging on
- set pagination off
- backtrace
- quit

and that should dump the required info to a filesystem location.

Comment 12 Dan Williams 2018-09-25 16:58:58 UTC

Ping on this issue?

Comment 13 Justin Pierce 2018-09-25 18:37:59 UTC

I haven't seen this in awhile, so I'm fine to close and follow recommended capture steps if it reoccurs.

Comment 14 Casey Callendrello 2019-01-03 22:33:18 UTC

Gotcha; I've closed this - feel free to reopen / open a new one if it reoccurs.

Note You need to log in before you can comment on or make changes to this bug.