Bug 2030476

Summary:	Kernel 4.18.0-348.2.1 secpath_cache memory leak involving strongswan tunnel
Product:	Red Hat Enterprise Linux 8	Reporter:	kegbeach <ryan>
Component:	kernel	Assignee:	Xin Long <lxin>
kernel sub component:	Networking	QA Contact:	Jianlin Shi <jishi>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	urgent
Priority:	high	CC:	cye, hasingh, jiji, jishi, kzhang, lxin, mleitner, mtesar, nmurray, nyelle, pabeni, pasteur, prpatel, skamboj, sukulkar, xmu, yuma
Version:	8.5	Keywords:	Triaged, ZStream
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-4.18.0-358.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2047427 (view as bug list)		Environment:
Last Closed:	2022-05-10 15:09:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2047427

Description kegbeach 2021-12-08 21:42:00 UTC

Description of problem:
I have reason to believe that kernel 4.18.0-348.2.1 has an issue while running strongswan. secpath_cache keeps increasing at a fast rate until server runs out of memory. Stopping strongswan halts secpath_cache from increasing. When using old kernel 4.18.0-348 or 4.18.0-305.7.1 the issue goes away. Running CentOS Stream 8. Have older server running 4.18.0-348 without issue with most recent strongswan and problematic system got rebooted to 4.18.0-305.7.1 without issue.


Version-Release number of selected component (if applicable):
4.18.0-348.2.1 / strongswan 5.9.4-2.el8


How reproducible:
This happens after every reboot and within a couple hours memory will decrease at a consistent rate until gone.


Steps to Reproduce:
1. setup strongswan site to site tunnel
2. initiate connection and pass traffic
3. keep an eye out on avail memory and watch secpath_cache steadily increase using slabtop or cat /proc/slabinfo | grep secpath_cache


Actual results:
System runs out of memory (about 1.25 days for 4gig)


Expected results:
Memory should stay consistent with the load of the system.


Additional info:
n/a

Comment 1 Xin Long 2021-12-09 19:48:46 UTC

(In reply to kegbeach from comment #0)
> Description of problem:
...
> 
> Steps to Reproduce:
> 1. setup strongswan site to site tunnel
> 2. initiate connection and pass traffic
> 3. keep an eye out on avail memory and watch secpath_cache steadily increase
> using slabtop or cat /proc/slabinfo | grep secpath_cache
As we don't really use strongswan on rhel, but libreswan instead, can you please provide a reproducer also including how to install and configure strongswan on RHEL to preoduce this issue if you want us to investigate this? Also, have you tested it on upstream kernel?

Thanks.

Comment 2 kegbeach 2021-12-10 17:08:35 UTC

I have installed libreswan site to site tunnel in replace of strongswan and after a few hours the system starts losing memory with the exact same symptoms. Stopping libreswan (systemctl stop ipsec) halts secpath_cache from increasing and upon restarting the tunnel it begins to increase again. This is telling me the problem is not strongswan itself but something with the kernel. If you have a testbed for libreswan this will successfully reproduce the issue but if you would still like the configs I will send them over.

The only kernel I have tried that has the issue is 4.18.0-348.2.1 so if there is another more recent kernel can you point me in the direction and I will re-test.

Thanks

Comment 4 Xin Long 2021-12-15 18:21:44 UTC

(In reply to kegbeach from comment #2)
> I have installed libreswan site to site tunnel in replace of strongswan and
> after a few hours the system starts losing memory with the exact same
> symptoms. Stopping libreswan (systemctl stop ipsec) halts secpath_cache from
> increasing and upon restarting the tunnel it begins to increase again. This
> is telling me the problem is not strongswan itself but something with the
> kernel. If you have a testbed for libreswan this will successfully reproduce
> the issue but if you would still like the configs I will send them over.
> 
> The only kernel I have tried that has the issue is 4.18.0-348.2.1 so if
> there is another more recent kernel can you point me in the direction and I
> will re-test.
> 
I can reproduce it with 'ip xfrm' cmds now, it's indeed a kernel problem.

Thanks for reporting it.

Comment 6 Xin Long 2021-12-16 18:50:45 UTC

This leak was introduced by:

commit acc00ba5d8d48f8749572597b051b3e7ba9ab3ff
Author: Paolo Abeni <pabeni>
Date:   Mon Sep 13 12:32:20 2021 +0200

    net: re-initialize slow_gro flag at gro_list_prepare time

The leaked object was created in:

    [<000000004241fc10>] kmem_cache_alloc+0x156/0x390
    [<0000000053d8cf53>] secpath_dup+0x23/0x1d0
    [<00000000a5fa59b1>] secpath_set+0x9f/0x160
    [<00000000266babc4>] xfrm_input+0x29c/0x2850
    [<0000000080081871>] xfrm4_esp_rcv+0x9f/0x190
    [<000000004b63ecc5>] ip_protocol_deliver_rcu+0x5ae/0x7d0
    [<000000005accc408>] ip_local_deliver_finish+0x222/0x330
    [<0000000073afae7a>] ip_local_deliver+0x1a0/0x410
    [<000000003af25303>] ip_rcv+0xa7d/0x123d
    [<00000000b58cea8c>] __netif_receive_skb_core+0x2051/0x3330
    [<000000007186f64a>] netif_receive_skb_internal+0xed/0x340
    [<000000000c4ddbf8>] napi_gro_receive+0x27f/0x3c0


As before arriving in xfrm_input, skb->slow_gro is already 1; then in xfrm_input, it call gro_cells_receive() to start GRO again. However, in gro_list_prepare(), the slow_gro is set to 0 as skb's sk, dst, active_extensions and nfct are all NULL. Later when it comes to napi_skb_free_stolen_head() called by napi_skb_finish(), skb->sp is supposed to be freed in skb_ext_put(), but it's not as slow_gro is 0.

I'm thinking to fix this by also considering skb->sp when set slow_gro in gro_list_prepare():

diff --git a/net/core/dev.c b/net/core/dev.c
index d3f3336d3edf..0c87487f93b2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5659,7 +5659,7 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
        /* RHEL-only: out-of-tree drivers build vs prior release don't set
         * correctly the slow_gro flag, re-initialize it here
         */
-       skb->slow_gro = !!(skb->sk || skb->_skb_refdst ||
+       skb->slow_gro = !!(skb->sk || skb->_skb_refdst || skb->sp ||
 #ifdef CONFIG_SKB_EXTENSIONS
                           skb->active_extensions ||
 #endif

Thanks.

Comment 7 Paolo Abeni 2021-12-17 16:33:10 UTC

(In reply to Xin Long from comment #6)
> I'm thinking to fix this by also considering skb->sp when set slow_gro in
> gro_list_prepare():
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d3f3336d3edf..0c87487f93b2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5659,7 +5659,7 @@ static void gro_list_prepare(struct napi_struct *napi,
> struct sk_buff *skb)
>         /* RHEL-only: out-of-tree drivers build vs prior release don't set
>          * correctly the slow_gro flag, re-initialize it here
>          */
> -       skb->slow_gro = !!(skb->sk || skb->_skb_refdst ||
> +       skb->slow_gro = !!(skb->sk || skb->_skb_refdst || skb->sp ||
>  #ifdef CONFIG_SKB_EXTENSIONS
>                            skb->active_extensions ||
>  #endif

I double checked the relevant code paths, and I think the above fix is the correct one!

Thanks for catching it!

Comment 13 Jianlin Shi 2022-01-10 07:24:45 UTC

tested with following steps:

client:
systemctl stop NetworkManager                                                                         
ip addr add 192.168.4.2/24 dev ens1f0                                                                 
ip link set ens1f0 up                                                                                 
                                                                                                      
ip xfrm state add src 192.168.4.2 dst 192.168.4.1 spi 0x1001 proto esp enc aes 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f  mode tunnel sel src 192.168.4.2 dst 192.168.4.1
ip xfrm state add src 192.168.4.1 dst 192.168.4.2 spi 0x1000 proto esp enc aes 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f  mode tunnel sel src 192.168.4.1 dst 192.168.4.2
ip xfrm policy add dir out src 192.168.4.2 dst 192.168.4.1 tmpl src 192.168.4.2 dst 192.168.4.1 proto esp mode tunnel
ip xfrm policy add dir in src 192.168.4.1 dst 192.168.4.2 tmpl src 192.168.4.1 dst 192.168.4.2 proto esp mode tunnel level use
                                                                                                      
cat /proc/slabinfo | grep secpath_cache                                                               
netserver 

server:

systemctl stop NetworkManager
ip addr add 192.168.4.1/24 dev ens1f0
ip link set ens1f0 up

ip xfrm state add src 192.168.4.1 dst 192.168.4.2 spi 0x1000 proto esp enc aes 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f  mode tunnel sel src 192.168.4.1 dst 192.168.4.2
ip xfrm state add src 192.168.4.2 dst 192.168.4.1 spi 0x1001 proto esp enc aes 0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f  mode tunnel sel src 192.168.4.2 dst 192.168.4.1
ip xfrm policy add dir out src 192.168.4.1 dst 192.168.4.2 tmpl src 192.168.4.1 dst 192.168.4.2 proto esp mode tunnel
ip xfrm policy add dir in src 192.168.4.2 dst 192.168.4.1 tmpl src 192.168.4.2 dst 192.168.4.1 proto esp mode tunnel level use
                                                                                                      
ping 192.168.4.2 -c 1                                                                                 
netperf -H 192.168.4.2 -t TCP_STREAM -l 120


reproduced on 4.18.0-348.2.1:

+ cat /proc/slabinfo                                                                                  
+ grep secpath_cache                                                                                  
secpath_cache          0      0    128   32    1 : tunables    0    0    0 : slabdata      0      0      0
+ netserver

after server run:

[root@wsfd-advnetlab19 bz2030476]# cat /proc/slabinfo | grep secpath_cache                            
secpath_cache     6181120 6181120    128   32    1 : tunables    0    0    0 : slabdata 193160 193160      0

[root@wsfd-advnetlab19 bz2030476]# uname -a                                                           
Linux wsfd-advnetlab19.anl.lab.eng.bos.redhat.com 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Mon Nov 8 13:30:15 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

Verified on 4.18.0-358:

[root@wsfd-advnetlab19 bz2030476]# uname -a                                                           
Linux wsfd-advnetlab19.anl.lab.eng.bos.redhat.com 4.18.0-358.el8.x86_64 #1 SMP Tue Dec 28 11:15:35 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

after server run:

[root@wsfd-advnetlab19 bz2030476]# cat /proc/slabinfo | grep secpath_cache                            
secpath_cache       2400   2464    128   32    1 : tunables    0    0    0 : slabdata     77     77      0

Comment 25 errata-xmlrpc 2022-05-10 15:09:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1988

Comment 27 Red Hat Bugzilla 2023-09-15 01:50:26 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days