Red Hat Bugzilla – Bug 1459589
cpu soft lockup caused by iptables
Last modified: 2018-03-07 03:03:09 EST
Description of problem:
Quite often we see messages from kernel such as these:
Message from syslogd@ip-172-31-50-120 at Jun 5 13:51:58 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [iptables:47079]
Message from syslogd@ip-172-31-50-120 at Jun 6 10:50:14 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [iptables:38723]
Message from syslogd@ip-172-31-62-249 at Jun 6 10:50:12 ...
kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iptables:41786]
This happens on a cluster that's fairly busy, but the containers are not changing frequently. As in we run our monitoring on it, so it is not rare that a pod is around for more than a month. There is a massive amount of network traffic however, millions of metrics are passing through these pods every day. At this point no idea of the cause. Is it something kubernetes does? Is AWS having problems? What is puzzling to me that we are not changing any routes or pods or anything, so is iptables still churning through rules non stop?
Version-Release number of selected component (if applicable):
It happens multiple times a day. But have not found a way to trigger it on demand.
Steps to Reproduce:
1. There really aren't any steps that we know at this time to reproduce this.
Usually 1 or 2 a day we get these soft lockups on all nodes in the cluster.
Expect no soft lockups during normal operation.
If we can turn on more verbose logging in syslog or need to capture something specific please reach out and I'll set everything up.