1667220 – iptables rule causes 4.19.X kernel hung

Bug 1667220 - iptables rule causes 4.19.X kernel hung

Summary: iptables rule causes 4.19.X kernel hung

Keywords:
Status:	CLOSED DUPLICATE of bug 1659706
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-17 18:36 UTC by Vsevolod Volkov
Modified:	2019-01-18 19:45 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-01-18 19:45:09 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kernel log (33.74 KB, text/plain) 2019-01-17 18:36 UTC, Vsevolod Volkov	no flags	Details
kernel log (32.19 KB, text/plain) 2019-01-18 08:54 UTC, Vsevolod Volkov	no flags	Details
output of 'journalctl -b --no-hostname' (79.71 KB, text/plain) 2019-01-18 18:18 UTC, Vsevolod Volkov	no flags	Details
output of 'journalctl -b -1 --no-hostname' (90.93 KB, text/plain) 2019-01-18 18:20 UTC, Vsevolod Volkov	no flags	Details
View All

Description Vsevolod Volkov 2019-01-17 18:36:09 UTC

Created attachment 1521349 [details]
kernel log

1. Please describe the problem:

The following rule causes 4.19.X kernel hung:

iptables -A INPUT -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 5 --connlimit-mask 32 --connlimit-saddr -j REJECT --reject-with icmp-port-unreachable


2. What is the Version-Release number of the kernel:

Any 4.19.X. The problem is reproducable with all 4.19.X kernels from Fedora 29 repository: from 4.19.2-301.fc29.x86_64 to 4.19.14-300.fc29.x86_64 (it's the newest now). 4.18.X works fine (tested up to 4.18.18-300.fc29.x86_64).


3. Did it work previously in Fedora?

Yes.

   If so, what kernel version did the issue *first* appear?

4.19.2-301.fc29.x86_64 (first 4.19 kernel in Fedora 29).


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

- Upgrade to Fedora 29 or setup new instance.

- Remove firewalld.

- Make sure iptables have no rules:

# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

- Add the rule:

# iptables -A INPUT -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 5 --connlimit-mask 32 --connlimit-saddr -j REJECT --reject-with icmp-port-unreachable
# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 5 --connlimit-mask 32 --connlimit-saddr -j REJECT --reject-with icmp-port-unreachable

- Wait from minute to couple of hours.


5. Does this problem occur with the latest Rawhide kernel?

Not tested.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.


7. Please attach the kernel logs.

Attached.

PS. The bug reproduced on Ubuntu with 4.19 kernel too.

Comment 1 Jeremy Cline 2019-01-17 20:29:15 UTC

Thanks for the detailed bug report and reproducer. I don't see anything in the kernel log (did you select the right boot?), but based on the description this sounds like https://bugzilla.redhat.com/show_bug.cgi?id=1659706. Can you check to see if the Rawhide kernel works for you?

Comment 2 Vsevolod Volkov 2019-01-18 08:54:58 UTC

Created attachment 1521445 [details]
kernel log

I've tested new kernel 4.19.15-300.fc29.x86_64. Kernel log in the attachment. It hangs at 9:59 without messages.

But kernel 5.0.0-0.rc2.git1.1.fc30.x86_64 from rawhide works fine, uptime is 11:13 now.

Comment 3 Steve 2019-01-18 16:08:12 UTC

(In reply to Vsevolod Volkov from comment #0)
...
> - Wait from minute to couple of hours.
...

Are you using the system as you normally would before the system hangs?

> Jan 18 09:46:37 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.19.15-300.fc29.x86_64 root=LABEL=root ro quiet audit=0 LANG=uk_UA.UTF-8
> Jan 18 09:46:37 kernel: audit: disabled (until reboot)

The log isn't showing any "NETFILTER_CFG" messages. Could you try removing "quiet" and "audit=0" from the kernel command-line?

These will give you the journalctl output for the current boot and the previous boot, respectively:

$ journalctl -b > journalctl-1.log
$ journalctl -b -1 > journalctl-2.log # The "-1" option selects the previous boot.

Comment 4 Steve 2019-01-18 16:28:57 UTC

(In reply to Steve from comment #3)
...
> $ journalctl -b > journalctl-1.log
> $ journalctl -b -1 > journalctl-2.log # The "-1" option selects the previous boot.

I forgot the "--no-hostname" option:

$ journalctl -b --no-hostname > journalctl-1.log
$ journalctl -b -1 --no-hostname > journalctl-2.log

Comment 5 Steve 2019-01-18 17:36:01 UTC

> -A INPUT -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 5 --connlimit-mask 32 --connlimit-saddr -j REJECT --reject-with icmp-port-unreachable

It might be possible to induce the hang to occur sooner by reducing "--connlimit-above" to "1" or "2".

Here is a test procedure:

Boot to runlevel 3 or open a full-screen terminal window and run:

$ dmesg -w

Initiate a server stress test. Possible tools include "curl", "ab" (Apache benchmark), and "httpress":

$ rpm -q curl httpd-tools httpress
curl-7.61.1-6.fc29.x86_64
httpd-tools-2.4.37-5.fc29.x86_64  # includes "ab"
httpress-1.1.0-11.fc29.x86_64

For "ab", see the "-c" option in the man page:

-c concurrency
              Number of multiple requests to perform at a time. Default is one request at a time.

Comment 6 Vsevolod Volkov 2019-01-18 18:18:55 UTC

Created attachment 1521649 [details]
output of 'journalctl -b --no-hostname'

Comment 7 Vsevolod Volkov 2019-01-18 18:20:49 UTC

Created attachment 1521651 [details]
output of 'journalctl -b -1 --no-hostname'

Comment 8 Vsevolod Volkov 2019-01-18 18:21:55 UTC

(In reply to Steve from comment #4)
> $ journalctl -b --no-hostname > journalctl-1.log
> $ journalctl -b -1 --no-hostname > journalctl-2.log

Attached.

Comment 9 Vsevolod Volkov 2019-01-18 18:47:47 UTC

(In reply to Steve from comment #5)
> It might be possible to induce the hang to occur sooner by reducing
> "--connlimit-above" to "1" or "2".

I don't think so, but I'll try. Originally there was 20.

> Initiate a server stress test. Possible tools include "curl", "ab" (Apache
> benchmark), and "httpress":

There is no apache on the test server. The only listening service is sshd. I have no ideas how to use ab or curl for stress test of sshd. Some days ago I wrote simple perl script which generates concurrent connections. But I couldn't induce the hang. The rule works as it should: just rejects the connections above the limit.

Comment 10 Steve 2019-01-18 19:31:07 UTC

Thanks for attaching the journalctl output:

1. What do you show for this?

$ ls -F /proc/sys/net/netfilter/

Log snippet:
$ grep -n -m 2 netfilter journalctl-1.log 
437:Jan 18 20:08:57 systemd-sysctl[193]: Couldn't write '30' to 'net/netfilter/nf_conntrack_tcp_timeout_time_wait', ignoring: No such file or directory
438:Jan 18 20:08:57 systemd-sysctl[193]: Couldn't write '10' to 'net/netfilter/nf_conntrack_tcp_timeout_syn_recv', ignoring: No such file or directory

2. What iptables packages do you have installed?

$ rpm -qa iptables\*

Log snippet:
$ grep -n -m 2 iptables journalctl-1.log
623:Jan 18 20:08:59 systemd[1]: Starting IPv4 firewall with iptables...
629:Jan 18 20:08:59 iptables.init[305]: /usr/libexec/iptables/iptables.init: рядок 22: /etc/init.d/functions: No such file or directory

3. Are you only serving sshd connections? (You answered this in Comment 9, so this is just for the record.)

$ ss -tln

Log snippet:
$ grep -n -m 2 'sshd\[' journalctl-1.log
656:Jan 18 20:08:59 sshd[322]: Server listening on 0.0.0.0 port 22.
657:Jan 18 20:08:59 sshd[322]: Server listening on :: port 22.

Comment 11 Jeremy Cline 2019-01-18 19:45:09 UTC

(In reply to Vsevolod Volkov from comment #2)
> Created attachment 1521445 [details]
> kernel log
> 
> I've tested new kernel 4.19.15-300.fc29.x86_64. Kernel log in the
> attachment. It hangs at 9:59 without messages.
> 
> But kernel 5.0.0-0.rc2.git1.1.fc30.x86_64 from rawhide works fine, uptime is
> 11:13 now.

Thanks for testing that. There's a patch series queued for 4.20.4 that should fix this, a couple other people are also hitting this and have also reported the series fixes the issues for them.

*** This bug has been marked as a duplicate of bug 1659706 ***

Note You need to log in before you can comment on or make changes to this bug.