Bug 1569128 - [starter-us-east-1] iptables running > 20 minutes with 100% CPU
Summary: [starter-us-east-1] iptables running > 20 minutes with 100% CPU
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.z
Assignee: Dan Williams
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-18 16:17 UTC by Justin Pierce
Modified: 2019-01-03 22:33 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-03 22:33:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Justin Pierce 2018-04-18 16:17:08 UTC
Description of problem:
SRE was notified by a zabbix alert of a performance issues on a starter cluster compute node: starter-us-east-1-node-compute-f1526 . Watching htop, iptables was consuming 100% of a core for more than 20 minutes (same PID). 

Version-Release number of selected component (if applicable):
v3.9.14
kernel: Linux ip-172-31-53-238.ec2.internal 3.10.0-693.21.1.el7.x86_64 #1 SMP Fri Feb 23 18:54:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Seems to be a rolling problem in the starter clusters.

Steps to Reproduce:
1. Unknown

Additional info:
- After 20 minutes, the iptables process appeared to exit and htop returned to normal. 
- For unknown reasons, it was taking about 10 times as long to ssh to this box (~30 seconds) as it was to other nodes in the cluster.
- No excessive CPU/Disk/Memory pressure otherwise noted (iptables was taking a full core, but plenty of other cores were free; Load average: 33.85 31.63 30.86).

Comment 4 Dan Williams 2018-04-25 16:11:53 UTC
If this happens again, can somebody 'debuginfo-install iptables' and then 'gdb attach <pid>' and "backtrace" so we can see what's actually going on?

Comment 9 Ben Bennett 2018-05-16 14:15:13 UTC
Can we add a cron job to check to see if iptables has been running for more than a minute and then attach to the pid and generate a backtrace and email it out?

Comment 10 Justin Pierce 2018-05-25 20:56:18 UTC
I'd be glad to install it if someone can write it to dump to a file system location. I'll monitor for dumps and email if one is generated.

Comment 11 Dan Williams 2018-08-15 20:01:46 UTC
I believe you can do:

- debuginfo-install iptables
- gdb attach `pidof iptables`
- set logging file /tmp/gdb.txt
- set logging on
- set pagination off
- backtrace
- quit

and that should dump the required info to a filesystem location.

Comment 12 Dan Williams 2018-09-25 16:58:58 UTC
Ping on this issue?

Comment 13 Justin Pierce 2018-09-25 18:37:59 UTC
I haven't seen this in awhile, so I'm fine to close and follow recommended capture steps if it reoccurs.

Comment 14 Casey Callendrello 2019-01-03 22:33:18 UTC
Gotcha; I've closed this - feel free to reopen / open a new one if it reoccurs.


Note You need to log in before you can comment on or make changes to this bug.