Bug 1402695 - netfilter regression causes lost pings "operation not permitted"
Summary: netfilter regression causes lost pings "operation not permitted"
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 24
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-08 07:44 UTC by Trevor Cordes
Modified: 2019-01-09 12:54 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-25 10:08:53 UTC
Type: Bug


Attachments (Terms of Use)
perl script to demo this bug; scans all IPs on your /24 LAN (change the subnet value first) (1.87 KB, application/x-perl)
2016-12-08 07:44 UTC, Trevor Cordes
no flags Details

Description Trevor Cordes 2016-12-08 07:44:34 UTC
Created attachment 1229393 [details]
perl script to demo this bug; scans all IPs on your /24 LAN (change the subnet value first)

Description of problem:
Note: I've bisected already, see below.

Hi!  4.8.x caused a script of mine that pings all IPs on my LAN /24 subnet in about 0.5s, and nmap doing the same, to error on the send() call with "operation not permitted".  This happens after a somewhat random number of packets have already been sent.  That number shrinks each time you run the script, so the first run you'll get up to around 200 pings, then it goes down to 50 pings, before the error.  If you wait, it goes back up to around 200 pings.  It almost never completes all 253 of them.

Interestingly, the problem only occurs when you ping different IPs.  If you send the same ping count using my script to just one IP, there is no bug.

Strangely, iptable_nat module MUST be loaded for the bug to show up!  If you rmmod it, the bug goes away.  Interestingly, the bug occurs even if you have every iptables table (including -t nat) completely empty (no rules).  All that is required is iptable_nat simply to be loaded.

4.7.0 kernels don't have this problem: the pings go out and everything is fine no matter how fast you repeat the script.

I bisected the bug to:
870190a9ec9075205c0fa795a09fa931694a3ff1
7c9664351980aaa6a4b8837a314360b3a4ad382a
I had to skip 7c9664351980aaa6a4b8837a314360b3a4ad382a because it wouldn't boot... just panic on every try.  So I can't narrow it any closer than within 2
commits.

I played with all the sysctls that looked relevant, like: ratelimit, per_sec, max, etc.  I modified everything I could find but nothing made the problem go away, though I *think* some had a modest effect on how many times I could run the script before the error popped up, but even if I took them to extreme values the bug never went away.  I'm back to the Fedora defaults now, and can attach on request.


Version-Release number of selected component (if applicable):
4.8.11-100.fc23.x86_64
(4.8.8 and 4.8.10 also have the bug, one other guy who has experienced this on Ubuntu claims it's anything above 4.8.4


How reproducible:
always


Steps to Reproduce:
1. Boot into 4.8.8+ kernel
2. modprobe iptable_nat
3. nmap -PE 192.168.100.0/24 (use your subnet)
or run my attached script which, makes it even clearer, after modifying the subnet.  Might have to repeat the nmap or script rapidly sequentially a few times.

Actual results:
errors:
Starting Nmap 7.12 ( https://nmap.org ) at 2016-12-08 01:33 CST
sendto in send_ip_packet_sd: sendto(5, packet, 44, 0, 192.168.100.131, 16) => Operation not permitted
Offending packet: TCP 192.168.100.1:38593 > 192.168.100.131:143 S ttl=53 id=5409 iplen=44  seq=3581424515 win=1024 <mss 1460>
sendto in send_ip_packet_sd: sendto(5, packet, 44, 0, 192.168.100.2, 16) => Operation not permitted
...

Expected results:
no errors, normal nmap output


Additional info:
I've also posted this info to LKML, CC the committers, with the same subject, no replies yet.

Comment 1 Fedora End Of Life 2016-12-20 21:44:29 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 2 Trevor Cordes 2016-12-20 22:55:43 UTC
Bug remains in F24, which I just upgraded to.  LKML guys have seen my bug request and are looking into it.  I will report back.

Comment 3 Trevor Cordes 2017-01-05 05:03:36 UTC
LKML guys have come up with a patch and added it to 4.8-stable review, as per the following:

https://lkml.org/lkml/2017/1/4/946
https://lkml.org/lkml/2017/1/4/936

[PATCH 4.8 83/85] Revert "netfilter: nat: convert nat bysrc hash to rhashtable"
[PATCH 4.8 84/85] Revert "netfilter: move nat hlist_head to nf_conn"

They simply reverted the 2 commits I bisected to.  They say a "real" fix has been done in 4.9.

I suppose this will roll around into F24's kernel errata eventually and then we can close this bz?

Comment 4 Mark Ziesemer 2017-01-15 05:58:04 UTC
FYI, I found this ticket in my quest to resolve the same issues I was seeing with nmap - albeit on a different Linux 4.8 distribution.  Cross-linking to http://unix.stackexchange.com/q/337082/83846, in case the details there are helpful for anyone else experiencing this.  Looking forward for 4.9 to become officially available on our respective distributions!

Comment 5 Trevor Cordes 2017-01-25 10:08:53 UTC
As of kernel 4.9.5-100.fc24.x86_64 this bug is fixed.  Apparently the LKML guys have tweaked the 4.8 stable to fix this bug as well, but I have not tested that.  Closing bug.  Thanks!


Note You need to log in before you can comment on or make changes to this bug.