Bug 1752765

Summary: conntrack tool delete entry with CIDR crash
Product: Red Hat Enterprise Linux 8 Reporter: yiche <yiche>
Component: kernelAssignee: Phil Sutter <psutter>
kernel sub component: Netfilter QA Contact: yiche <yiche>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: medium CC: bperkins, jiji, lmiksik, network-qe, psutter
Version: 8.1   
Target Milestone: rc   
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-181.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:26:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Phil Sutter 2019-09-18 16:15:36 UTC
Finally, I found something:

| # conntrack -I -s 1.1.1.1 -d 2.2.2.2 -p tcp --sport 10 --dport 20 --state LISTEN -u SEEN_REPLY -t 50
| conntrack v1.4.4 (conntrack-tools): 1 flow entries have been created.
| # conntrack -L
| tcp      6 47 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
| conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.

As you can see, the conntrack entry is not correctly created in the first place.

While comparing my VM (which didn't have the problem) with your testing env
(thanks for preparing it, BTW) I noticed that on your machine no conntrack
entries were created automatically. Turns out on my VM firewalld was active and
therefore NAT tables loaded. After stopping the firewall and unloading related
modules my VM shows the same symptoms.

Comment 5 yiche 2019-09-19 02:24:59 UTC
Thank Phil,
I find if only load nat table, conntrack entry also created not correctly
nf_conntrack
nf_conntrack_netlink
nft_nat
nf_nat
nf_tables
nfnetlink
# conntrack -I -s 1.1.1.1 -d 2.2.2.2 -p tcp --sport 10 --dport 20 --state LISTEN -u SEEN_REPLY -t 500
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been created.
# conntrack -L
tcp      6 498 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.


If load nf_conntrack_ipv4,would not trigger crash, but the entry has not been deleted, and TCP state become to "NONE"
nf_conntrack_ipv4
nf_defrag_ipv4
nf_conntrack
nf_conntrack_netlink
nfnetlink
# modprobe nf_conntrack_ipv4
# conntrack -D -s 1.1.1.0/24 -d 2.2.2.0/24
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been deleted.
# conntrack -L
tcp      6 399 NONE src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.

If both load nf_conntrack_ipv4 and nf_nat. Entry creat/delete success
nf_conntrack_ipv4
nf_defrag_ipv4
nf_conntrack
nf_conntrack_netlink
nf_nat
nfnetlink

Hope this help.

Comment 6 yiche 2019-09-19 02:30:25 UTC
I think in customer's environment both nf_conntrack_ipv4 and nf_nat probably loaded.
So this issue is not as serious as thought

Comment 7 Phil Sutter 2019-09-19 11:08:31 UTC
Hi,

In fact, you only need nf_conntrack_ipv4.ko loaded. The problem is that when
creating the entry a lookup for given l3proto number is performed and without
the module loaded a generic entry is returned.

Upstream doesn't have the problem (anymore) because l3proto abstraction and related modules were removed in this commit:

commit a0ae2562c6c4b2721d9fddba63b7286c13517d9f
Author: Florian Westphal <fw>
Date:   Fri Jun 29 07:46:51 2018 +0200

    netfilter: conntrack: remove l3proto abstraction

    This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
    abstraction.

    This gets rid of all l3proto indirect calls and the need to do
    a lookup on the function to call for l3 demux.

    It increases module size by only a small amount (12kbyte), so this reduces
    size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
    or nf_conntrack_ipv6 module.

    before:
       text    data     bss     dec     hex filename
       7357    1088       0    8445    20fd nf_conntrack_ipv4.ko
       7405    1084       4    8493    212d nf_conntrack_ipv6.ko
      72614   13689     236   86539   1520b nf_conntrack.ko
     19K nf_conntrack_ipv4.ko
     19K nf_conntrack_ipv6.ko
    179K nf_conntrack.ko

    after:
       text    data     bss     dec     hex filename
      79277   13937     236   93450   16d0a nf_conntrack.ko
      191K nf_conntrack.ko

    Signed-off-by: Florian Westphal <fw>
    Signed-off-by: Pablo Neira Ayuso <pablo>

Comment 8 yiche 2019-11-18 03:20:05 UTC
This bug is covered by upstream test http://git.netfilter.org/conntrack-tools/tree/tests/conntrack/testsuite/01delete

Comment 12 Phil Sutter 2020-01-30 16:45:11 UTC
To avoid the problem described here, I wrote a small kernel patch to request loading an appropriate kernel module in case initial l3proto number lookup fails. Here's the commit message which provides more rationale:

If __nf_ct_l3proto_find() returns a pointer to
nf_conntrack_l3proto_generic the passed layer3 proto number is not
known. This may happen for actually well known values if the
corresponding kernel module has not been loaded, leading to unexpected
behaviour:

| # conntrack -I -s 1.1.1.1 -d 2.2.2.2 -p tcp --sport 10 --dport 20 --state LISTEN -u SEEN_REPLY -t 50
| conntrack v1.4.4 (conntrack-tools): 1 flow entries have been created.
| # conntrack -L
| tcp      6 47 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 src=0.0.0.0 dst=0.0.0.0 sport=0 dport=0 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
| conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
| # conntrack -D -s 1.1.1.0/24
| conntrack v1.4.4 (conntrack-tools): 0 flow entries have been deleted.

So on one hand the created conntrack entry is not as expected, on the
other it can't be deleted anymore using selectors which would otherwise
correctly match it.

Since typical users of 'conntrack' tool will have an appropriate
netfilter setup in place and therefore the required kernel modules
loaded (nf_conntrack_ipv4.ko in this case), this is a corner-case.

Upstream merged the various l3proto modules into nf_conntrack.ko itself
in commit a0ae2562c6c4b ("netfilter: conntrack: remove l3proto
abstraction"), hence doesn't suffer from the problem anymore. This patch
is non-trivial to backport and likely to break kABI, though.

Given the circumstances, go with a RHEL-only fix to sanitize behaviour
until a valid reason to backport a0ae2562c6c4b comes up.

Comment 13 Phil Sutter 2020-01-31 00:13:47 UTC
Patch submitted to rhkernel list, Message ID is 20200131000057.19973-1-psutter (archives are outdated currently).

Comment 14 yiche 2020-01-31 02:51:25 UTC
Provide qa_ack.

Comment 15 Herton R. Krzesinski 2020-02-21 17:59:07 UTC
Patch(es) available on kernel-4.18.0-181.el8

Comment 22 errata-xmlrpc 2020-04-28 16:26:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1769