Bug 1677323
Summary: | iptables -X returns iptables: No buffer space available when huge amount of chains are used. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Robin Hack <rhack> |
Component: | iptables | Assignee: | Phil Sutter <psutter> |
Status: | CLOSED ERRATA | QA Contact: | Jiri Peska <jpeska> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 8.0 | CC: | iptables-maint-list, jpeska, qe-baseos-daemons, rkhan, todoleza |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | iptables-1.8.2-16.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-05 22:17:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Robin Hack
2019-02-14 14:41:09 UTC
Hi Robin, I can't reproduce this on a beaker machine (admittedly a pretty fat one): [root@wsfd-netdev13 ~]# ./iptables-restore-reproducer.sh calling iptables-nft-restore real 0m13.036s user 0m12.262s sys 0m4.245s calling iptables-nft -F real 1m2.740s user 0m0.885s sys 1m1.427s calling iptables-nft -X real 0m8.028s user 0m0.266s sys 0m7.734s [root@wsfd-netdev13 ~]# uname -a Linux wsfd-netdev13.ntdv.lab.eng.bos.redhat.com 4.18.0-68.el8.x86_64 #1 SMP Wed Feb 13 14:25:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root@wsfd-netdev13 ~]# rpm -q iptables iptables-1.8.2-9.el8.x86_64 [root@wsfd-netdev13 ~]# rpm -q libnftnl libnftnl-1.1.1-4.el8.x86_64 [root@wsfd-netdev13 ~]# cat iptables-restore-reproducer.sh #! /bin/bash echo "calling iptables-nft-restore" time iptables-restore <( echo "*filter" for i in $(seq 0 200000);do printf ":chain_%06x - [0:0]\n" $i done for i in $(seq 0 200000);do printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i done echo COMMIT ) echo "calling iptables-nft -F" time iptables -F echo "calling iptables-nft -X" time iptables -X [root@wsfd-netdev13 ~]# Hi Phil. # free -m total used free shared buff/cache available Mem: 1829 141 1435 8 251 1534 Swap: 0 0 0 strace output of iptables -X: sendto(3, {{len=20, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_GETCHAIN, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {nfgen_family=AF_INET, version=NFNETLINK_V0, res_id=htons(0)}, 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20 recvmsg(3, ... tons of structs ) = 16488 brk(NULL) = 0x56131b143000 brk(0x56131b164000) = 0x56131b164000 ... skipped ... sendmsg(3, tons of data ... = 12000040 select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0}) recvmsg(3, {msg_namelen=12}, 0) = -1 ENOBUFS (No buffer space available) I can attach whole strace output :). Hi Robin, (In reply to Robin Hack from comment #2) > Hi Phil. > > # free -m > total used free shared buff/cache > available > Mem: 1829 141 1435 8 251 > 1534 > Swap: 0 0 0 > > strace output of iptables -X: > sendto(3, {{len=20, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_GETCHAIN, > flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {nfgen_family=AF_INET, > version=NFNETLINK_V0, res_id=htons(0)}, 20, 0, {sa_family=AF_NETLINK, > nl_pid=0, nl_groups=00000000}, 12) = 20 > recvmsg(3, ... tons of structs ) = 16488 > brk(NULL) = 0x56131b143000 > brk(0x56131b164000) = 0x56131b164000 > > ... skipped ... > > sendmsg(3, tons of data ... = 12000040 > select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left > {tv_sec=0, tv_usec=0}) > recvmsg(3, {msg_namelen=12}, 0) = -1 ENOBUFS (No buffer space > available) Receiving ENOBUFS when calling recvmsg() typically happens if a single netlink message exceeds the 32k max buffer size supported by kernel. I wonder why this doesn't happen for me though. Could you perhaps try with a smaller number of chains and rules? The difference should still be noticeable and for use in a test script, a delay of over a minute is probably too large, anyway. Thanks, Phil Hello. Ok. It looks like for ((i = 0; i < 300000; ++i)); do printf ":chain_%06x - [0:0]\n" $i done is not a issue itself even with that big number. However with combination with: for ((i = 0; i < 1000; ++i)); do printf -- "-A INPUT -j chain_%06x\n" $i printf -- "-A INPUT -j chain_%06x\n" $i done it returns iptables: No buffer space available. but with smaller numbers it's starts to return: 100 chains - iptables v1.8.2 (nf_tables): CHAIN_USER_DEL failed (Device or resource busy): chain chain_000000 10 chains - iptables v1.8.2 (nf_tables): CHAIN_USER_DEL failed (Device or resource busy): chain chain_000000 kernel-4.18.0-69.el8.x86_64 iptables-1.8.2-9.el8.x86_64 libnftnl-1.1.1-4.el8.x86_64 I managed to reproduce the issue. Turned out I missed the fact that it happens only if one doesn't call 'iptables -F' before calling 'iptables -X'. Sent a patch upstream, mostly to ask for advice on how to properly fix it: https://marc.info/?l=netfilter-devel&m=156208033321053&w=2 Fix sent upstream: https://marc.info/?l=netfilter-devel&m=156209061024148&w=2 Upstream commit to backport: commit d3e39e9c457f452540359e42fb58d64a28fe3e18 (origin/master, origin/HEAD) Author: Phil Sutter <phil> Date: Tue Jul 2 20:30:49 2019 +0200 nft: Set socket receive buffer When trying to delete user-defined chains in a large ruleset, iptables-nft aborts with "No buffer space available". This can be reproduced using the following script: | #! /bin/bash | iptables-nft-restore <( | | echo "*filter" | for i in $(seq 0 200000);do | printf ":chain_%06x - [0:0]\n" $i | done | for i in $(seq 0 200000);do | printf -- "-A INPUT -j chain_%06x\n" $i | printf -- "-A INPUT -j chain_%06x\n" $i | done | echo COMMIT | | ) | iptables-nft -X The problem seems to be the sheer amount of netlink error messages sent back to user space (one EBUSY for each chain). To solve this, set receive buffer size depending on number of commands sent to kernel. Suggested-by: Pablo Neira Ayuso <pablo> Signed-off-by: Phil Sutter <phil> Signed-off-by: Pablo Neira Ayuso <pablo> Tomas, Please consider providing qa_ack+ here. We're a bit late with RHEL8.1 but since it is a bug fix I think it's worth trying. Cheers, Phil Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3573 |