Bug 1677323
| Summary: | iptables -X returns iptables: No buffer space available when huge amount of chains are used. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Robin Hack <rhack> |
| Component: | iptables | Assignee: | Phil Sutter <psutter> |
| Status: | CLOSED ERRATA | QA Contact: | Jiri Peska <jpeska> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.0 | CC: | iptables-maint-list, jpeska, qe-baseos-daemons, rkhan, todoleza |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | iptables-1.8.2-16.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-11-05 22:17:43 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Robin Hack
2019-02-14 14:41:09 UTC
Hi Robin,
I can't reproduce this on a beaker machine (admittedly a pretty fat one):
[root@wsfd-netdev13 ~]# ./iptables-restore-reproducer.sh
calling iptables-nft-restore
real 0m13.036s
user 0m12.262s
sys 0m4.245s
calling iptables-nft -F
real 1m2.740s
user 0m0.885s
sys 1m1.427s
calling iptables-nft -X
real 0m8.028s
user 0m0.266s
sys 0m7.734s
[root@wsfd-netdev13 ~]# uname -a
Linux wsfd-netdev13.ntdv.lab.eng.bos.redhat.com 4.18.0-68.el8.x86_64 #1 SMP Wed Feb 13 14:25:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@wsfd-netdev13 ~]# rpm -q iptables
iptables-1.8.2-9.el8.x86_64
[root@wsfd-netdev13 ~]# rpm -q libnftnl
libnftnl-1.1.1-4.el8.x86_64
[root@wsfd-netdev13 ~]# cat iptables-restore-reproducer.sh
#! /bin/bash
echo "calling iptables-nft-restore"
time iptables-restore <(
echo "*filter"
for i in $(seq 0 200000);do
printf ":chain_%06x - [0:0]\n" $i
done
for i in $(seq 0 200000);do
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
done
echo COMMIT
)
echo "calling iptables-nft -F"
time iptables -F
echo "calling iptables-nft -X"
time iptables -X
[root@wsfd-netdev13 ~]#
Hi Phil.
# free -m
total used free shared buff/cache available
Mem: 1829 141 1435 8 251 1534
Swap: 0 0 0
strace output of iptables -X:
sendto(3, {{len=20, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_GETCHAIN, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {nfgen_family=AF_INET, version=NFNETLINK_V0, res_id=htons(0)}, 20, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 20
recvmsg(3, ... tons of structs ) = 16488
brk(NULL) = 0x56131b143000
brk(0x56131b164000) = 0x56131b164000
... skipped ...
sendmsg(3, tons of data ... = 12000040
select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left {tv_sec=0, tv_usec=0})
recvmsg(3, {msg_namelen=12}, 0) = -1 ENOBUFS (No buffer space available)
I can attach whole strace output :).
Hi Robin, (In reply to Robin Hack from comment #2) > Hi Phil. > > # free -m > total used free shared buff/cache > available > Mem: 1829 141 1435 8 251 > 1534 > Swap: 0 0 0 > > strace output of iptables -X: > sendto(3, {{len=20, type=NFNL_SUBSYS_NFTABLES<<8|NFT_MSG_GETCHAIN, > flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {nfgen_family=AF_INET, > version=NFNETLINK_V0, res_id=htons(0)}, 20, 0, {sa_family=AF_NETLINK, > nl_pid=0, nl_groups=00000000}, 12) = 20 > recvmsg(3, ... tons of structs ) = 16488 > brk(NULL) = 0x56131b143000 > brk(0x56131b164000) = 0x56131b164000 > > ... skipped ... > > sendmsg(3, tons of data ... = 12000040 > select(4, [3], NULL, NULL, {tv_sec=0, tv_usec=0}) = 1 (in [3], left > {tv_sec=0, tv_usec=0}) > recvmsg(3, {msg_namelen=12}, 0) = -1 ENOBUFS (No buffer space > available) Receiving ENOBUFS when calling recvmsg() typically happens if a single netlink message exceeds the 32k max buffer size supported by kernel. I wonder why this doesn't happen for me though. Could you perhaps try with a smaller number of chains and rules? The difference should still be noticeable and for use in a test script, a delay of over a minute is probably too large, anyway. Thanks, Phil Hello.
Ok. It looks like
for ((i = 0; i < 300000; ++i)); do
printf ":chain_%06x - [0:0]\n" $i
done
is not a issue itself even with that big number.
However with combination with:
for ((i = 0; i < 1000; ++i)); do
printf -- "-A INPUT -j chain_%06x\n" $i
printf -- "-A INPUT -j chain_%06x\n" $i
done
it returns iptables: No buffer space available.
but with smaller numbers it's starts to return:
100 chains - iptables v1.8.2 (nf_tables): CHAIN_USER_DEL failed (Device or resource busy): chain chain_000000
10 chains - iptables v1.8.2 (nf_tables): CHAIN_USER_DEL failed (Device or resource busy): chain chain_000000
kernel-4.18.0-69.el8.x86_64
iptables-1.8.2-9.el8.x86_64
libnftnl-1.1.1-4.el8.x86_64
I managed to reproduce the issue. Turned out I missed the fact that it happens only if one doesn't call 'iptables -F' before calling 'iptables -X'. Sent a patch upstream, mostly to ask for advice on how to properly fix it: https://marc.info/?l=netfilter-devel&m=156208033321053&w=2 Fix sent upstream: https://marc.info/?l=netfilter-devel&m=156209061024148&w=2 Upstream commit to backport:
commit d3e39e9c457f452540359e42fb58d64a28fe3e18 (origin/master, origin/HEAD)
Author: Phil Sutter <phil>
Date: Tue Jul 2 20:30:49 2019 +0200
nft: Set socket receive buffer
When trying to delete user-defined chains in a large ruleset,
iptables-nft aborts with "No buffer space available". This can be
reproduced using the following script:
| #! /bin/bash
| iptables-nft-restore <(
|
| echo "*filter"
| for i in $(seq 0 200000);do
| printf ":chain_%06x - [0:0]\n" $i
| done
| for i in $(seq 0 200000);do
| printf -- "-A INPUT -j chain_%06x\n" $i
| printf -- "-A INPUT -j chain_%06x\n" $i
| done
| echo COMMIT
|
| )
| iptables-nft -X
The problem seems to be the sheer amount of netlink error messages sent
back to user space (one EBUSY for each chain). To solve this, set
receive buffer size depending on number of commands sent to kernel.
Suggested-by: Pablo Neira Ayuso <pablo>
Signed-off-by: Phil Sutter <phil>
Signed-off-by: Pablo Neira Ayuso <pablo>
Tomas, Please consider providing qa_ack+ here. We're a bit late with RHEL8.1 but since it is a bug fix I think it's worth trying. Cheers, Phil Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3573 |