2024768 – ovn-controller is not returning memory back to pool after pod deletion

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2024768 - ovn-controller is not returning memory back to pool after pod deletion

Summary: ovn-controller is not returning memory back to pool after pod deletion

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	OVN
Sub Component:
Version:	FDP 21.K
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Dumitru Ceara
QA Contact:	Jianlin Shi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1988565 (view as bug list)
Depends On:
Blocks:	1958349
TreeView+	depends on / blocked

Reported:	2021-11-18 22:49 UTC by Murali Krishnasamy
Modified:	2023-10-15 04:25 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ovn21.12-21.12.0-24.el8fdp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-02-24 17:47:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Logs and ovn-appctl out (10.37 KB, text/plain) 2021-11-18 22:49 UTC, Murali Krishnasamy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FD-1659	0	None	None	None	2021-11-19 03:54:57 UTC
Red Hat Product Errata	RHBA-2022:0674	0	None	None	None	2022-02-24 17:47:57 UTC

Description Murali Krishnasamy 2021-11-18 22:49:57 UTC

Created attachment 1842645 [details]
Logs and ovn-appctl out

Description of problem:

On a 4.10 nightly baremetal cluster(500 node), ovnkube-node pod is consuming a reasonable memory while running a cluster density workload(30 pods per node) but it is not returning the memory to pool after deleting the test pods. ovn-controller container seems to be holding up it up forever until we restart them manually. 


Version-Release number of selected component (if applicable):
OCP - 4.10.0-0.nightly-2021-10-21-105053

[kni@e16-h12-b02-fc640 ~]$ oc rsh -c ovn-controller ovnkube-node-v4prw                                  
sh-4.4# rpm -qa | grep ovn
ovn21.09-central-21.09.0-25.el8fdp.x86_64
ovn21.09-vtep-21.09.0-25.el8fdp.x86_64
ovn21.09-21.09.0-25.el8fdp.x86_64
ovn21.09-host-21.09.0-25.el8fdp.x86_64

How reproducible:
Often Reproducible on baremetal cluster.

Steps to Reproduce:
1. Deploy a healthy cluster
2. Run a pod creation workload(30 pods per node) and watch the memory grows during workload
3. After deleting them, ovnkube-node is not releasing the memory back to the pool.


Actual results:
ovnkube-node does not release the memory forever until you restart them

Expected results:
Expecting it to release gradually as it used to do before

Additional info:

Comment 1 Dan Williams 2021-11-19 03:52:47 UTC

[root@worker417-r640 ~]# ovn-appctl -t ovn-controller lflow-cache/show-stats
Enabled: true
high-watermark  : 46699
total           : 32557
cache-conj-id   : 0
cache-expr      : 23326
cache-matches   : 9231
trim count      : 2
Mem usage (KB)  : 93930
[root@worker417-r640 ~]# ovn-appctl -t ovn-controller lflow-cache/flush   ->  CACHE FLUSHED
[root@worker417-r640 ~]# 
[root@worker417-r640 ~]# 
[root@worker417-r640 ~]# ovn-appctl -t ovn-controller lflow-cache/show-stats
Enabled: true
high-watermark  : 16546
total           : 16546
cache-conj-id   : 0
cache-expr      : 12052
cache-matches   : 4494
trim count      : 3
Mem usage (KB)  : 46946


Maybe the cache didn't fall below the watermark enough and thus it didn't trigger the automatic trim?

Comment 2 Dumitru Ceara 2021-11-23 19:50:34 UTC

There's at least one problem with the way ovn-controller trims memory
when scaling down. That's due to the fact that one load balancer
VIP generates 3 openflows per backend but only one logical flow.
ovn-controller is configured by default to trim memory when the
lflow-cache goes down under 50% of the previous high water mark.

With load balancer flows that means we will stop a bit too early from
trimming memory. We can actually see it in the logs that automatic
trimming stops happening and the ratio between lflow cache entries and
high watermark is approximately 65%.

We can fix this by making ovn-controller perform an unconditional trim,
just once, a fixed number of seconds after the lflow cache was updated
last. This would allow the system to reclaim all possible memory when
ovn-controller becomes idle. I sent a patch for that upstream:

http://patchwork.ozlabs.org/project/ovn/list/?series=273500&state=*

Nevertheless, I'd like to make sure we're not hitting other issues too.
Murali, would it be possible to run another test as follows?

1. Use the same ovn-kubernetes image as when the bug was reported:
quay.io/itssurya/dev-images:scale-fixes-PR-839-second-deadlock
2. Make sure all ovnkube-node and ovnkube-master pods have been
restarted and are using the new image.
3. Before running the test workload, choose one node, find its
ovnkube-node pod and delete it, e.g.:

oc delete pod ovnkube-node-xxx

# This will recreate a pod, ovnkube-node-yyy, but we know for sure
# ovn-controller started "clean" there.

4. Raise the memory trimming percentage:

oc exec ovnkube-node-yyy -c ovn-controller -- ovs-vsctl set open . external_ids:ovn-trim-wmark-perc-lflow-cache=70

5. Run the test workload.

6. Cleanup test resources and wait a bit (30 seconds should be enough)
then check memory usage of ovnkube-node-yyy.

Thanks,
Dumitru

Comment 3 Murali Krishnasamy 2021-11-30 04:18:00 UTC

Dumitru, 

I followed the steps(but using your image - quay.io/dceara0/dev-images:PR839-1118-01), 

$ oc get pods -o wide | grep 139-
ovnkube-node-78lsf     4/4     Running   2 (5d3h ago)    5d3h   192.168.216.152   worker139-fc640   <none>           <none>

$ oc delete pod ovnkube-node-78lsf
pod "ovnkube-node-78lsf" deleted   

$ oc get pods -o wide | grep 139-
ovnkube-node-rdl5d     4/4     Running   2 (40s ago)     44s    192.168.216.152   worker139-fc640   <none>           <none>

$ oc exec ovnkube-node-rdl5d -c ovn-controller -- ovs-vsctl
 set open . external_ids:ovn-trim-wmark-perc-lflow-cache=70

Memory stats - After restart
----------------------------
$ oc exec ovnkube-node-rdl5d -c ovn-controller -- ovn-appctl -t ovn-controller lflow-cache/show-stats
Enabled: true
high-watermark  : 16114
total           : 16113
cache-conj-id   : 0
cache-expr      : 11191
cache-matches   : 4922
trim count      : 0
Mem usage (KB)  : 47243

During Workload
---------------
$ oc exec ovnkube-node-rdl5d -c ovn-controller -- ovn-appctl -t ovn-controller lflow-cache/show-stats
Enabled: true                                                                  
high-watermark  : 77804                                                                                                                            
total           : 77801                                                        
cache-conj-id   : 0                                                                                                        
cache-expr      : 26795                                                        
cache-matches   : 51006                                                                                                    
trim count      : 0                                                            
Mem usage (KB)  : 224347 

After cleanup
-------------
$ oc exec ovnkube-node-rdl5d -c ovn-controller -- ovn-appctl -t ovn-controller lflow-cache/show-stats 
Enabled: true
high-watermark  : 18680
total           : 16113
cache-conj-id   : 0
cache-expr      : 11191
cache-matches   : 4922
trim count      : 4
Mem usage (KB)  : 47243

Still noticed the same problem, look at the grafana snapshot of ovnkube-node pod memory utilization - https://snapshot.raintank.io/dashboard/snapshot/p8Vm5vRdEtrjZg4SGLipDNSlK3XcS8eu?viewPanel=142&orgId=2

Comment 4 Dumitru Ceara 2021-11-30 09:17:03 UTC

Hi Murali,

Thanks for the test!

Looking at the lflow cache stats "after cleanup" I see:

high-watermark  : 18680
total           : 16113

This means we're still above the 70% watermark percentage configured for
auto cache trimming.

I connected to the setup and forced an additional memory trim by
increasing the watermark percentage:

$ ovs-vsctl set open . external_ids:ovn-trim-wmark-perc-lflow-cache=90

This immediately triggered a trim in ovn-controller and memory usage
went down from 2.3g RSS to ~1.0g RSS.

With the patch I sent for review (http://patchwork.ozlabs.org/project/ovn/list/?series=273500&state=*)
this would happen automatically every time ovn-controller detects
there's no logical flows being added/removed for at least 30 seconds.

So, when that patch (or something similar) is accepted we shouldn't be
seeing this problem anymore.

Moving to POST.

Regards,
Dumitru

Comment 5 Dumitru Ceara 2021-12-03 15:59:09 UTC

*** Bug 1988565 has been marked as a duplicate of this bug. ***

Comment 7 Tim Rozet 2022-01-06 20:34:36 UTC

@dceara iiuc this should require no CMS configuration to work right? I noticed in your comment you did "$ ovs-vsctl set open . external_ids:ovn-trim-wmark-perc-lflow-cache=90". But then you go  onto say that is automatic with your patch. So I'm thinking the only potential configuration here for ovn-k is the timer (in case we want something more/less often than 30 sec). Is that right?

Comment 8 Dumitru Ceara 2022-01-07 09:48:33 UTC

(In reply to Tim Rozet from comment #7)
> @dceara iiuc this should require no CMS configuration to work
> right? I noticed in your comment you did "$ ovs-vsctl set open .
> external_ids:ovn-trim-wmark-perc-lflow-cache=90". But then you go  onto say
> that is automatic with your patch. So I'm thinking the only potential
> configuration here for ovn-k is the timer (in case we want something
> more/less often than 30 sec). Is that right?

Correct, ovn-k shouldn't need to do more than tweaking the timer at this
point.

However, a smaller value might be detrimental, memory trimming cab be a
costly operation.

Comment 9 Tim Rozet 2022-01-07 14:29:43 UTC

thanks @

Comment 10 Tim Rozet 2022-01-07 14:33:28 UTC

thanks @dceara. Is this easily backportable to earlier versions of OVN? Thinking of backporting it in OCP, which would need 21.09 and 20.12.

Comment 11 Dumitru Ceara 2022-01-07 14:44:19 UTC

(In reply to Tim Rozet from comment #10)
> thanks @dceara. Is this easily backportable to earlier versions of OVN?
> Thinking of backporting it in OCP, which would need 21.09 and 20.12.

Replying just from the perspective of feasibility:
- 21.09: should be straightforward
- 20.12: we would need to first port the patches added for bug 1967882

However, I think we need a wider audience discussion to see if we should backport these features downstream-only instead of bumping OCP to a newer (and better) OVN version (cc @mmichels).

Comment 15 Jianlin Shi 2022-01-19 02:45:24 UTC

tested with following script:

systemctl start openvswitch                           
systemctl start ovn-northd                                      
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642                         
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.184.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.184.25 external_ids:ovn-enable-lflow-cache=true external_ids:ovn-trim-wmark-perc-lflow-cache=10 
systemctl restart ovn-controller                     
                                                  
ovn-nbctl set NB_GLOBAL . options:northd_probe_interval=180000  
ovn-nbctl set connection . inactivity_probe=180000               
ovs-vsctl set open . external_ids:ovn-openflow-probe-interval=180
ovs-vsctl set open . external_ids:ovn-remote-probe-interval=180000
ovn-sbctl set connection . inactivity_probe=180000

ovn-nbctl ls-add public                                                 
ovn-nbctl lsp-add public ln_p1           
ovn-nbctl lsp-set-addresses ln_p1 unknown     
ovn-nbctl lsp-set-type ln_p1 localnet              
ovn-nbctl lsp-set-options ln_p1 network_name=nattest
                                                                              
controller_pid=$(cat /var/run/ovn/ovn-controller.pid )
grep RSS /proc/$controller_pid/status > test_stat

i=1                      
for m in `seq 0 9`;do                                      
  for n in `seq 1 99`;do                             
    ovn-nbctl lr-add r${i}          
    ovn-nbctl lrp-add r${i} r${i}_public 00:de:ad:ff:$m:$n 172.16.$m.$n/16
    ovn-nbctl lrp-add r${i} r${i}_s${i} 00:de:ad:fe:$m:$n 173.$m.$n.1/24
    ovn-nbctl lr-nat-add r${i} dnat_and_snat 172.16.${m}.$((n+100)) 173.$m.$n.2
    ovn-nbctl lrp-set-gateway-chassis r${i}_public hv1
                                                                
                # s1      
    ovn-nbctl ls-add s${i}                                 
                                                  
                # s1 - r1               
    ovn-nbctl lsp-add s${i} s${i}_r${i}              
    ovn-nbctl lsp-set-type s${i}_r${i} router     
    ovn-nbctl lsp-set-addresses s${i}_r${i} router              
    ovn-nbctl lsp-set-options s${i}_r${i} router-port=r${i}_s${i}
                # s1 - vm1                                      
    ovn-nbctl lsp-add s$i vm$i                             
    ovn-nbctl lsp-set-addresses vm$i "00:de:ad:01:$m:$n 173.$m.$n.2"
    ovs-vsctl add-port br-int vm$i -- set interface vm$i type=internal external_ids:iface-id=vm$i
    ovn-nbctl lrp-add r$i r${i}_public 40:44:00:00:$m:$n 172.16.$m.$n/16
    ovn-nbctl lsp-add public public_r${i}
    ovn-nbctl lsp-set-type public_r${i} router
    ovn-nbctl lsp-set-addresses public_r${i} router
    
    ovn-nbctl lsp-set-options public_r${i} router-port=r${i}_public
    let i++
    if [ $i -gt 300 ];then
       break;
    fi 
  done
  if [ $i -gt 300 ];then
    break;
  fi
done
#add host vm1
ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 netns vm1 
ip netns exec vm1 ip link set vm1 address 00:de:ad:01:00:01
ip netns exec vm1 ip addr add 173.0.1.2/24 dev vm1
ip netns exec vm1 ip link set vm1 up
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1

ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 netns vm2 
ip netns exec vm2 ip link set vm2 address 00:de:ad:01:00:02
ip netns exec vm2 ip addr add 173.0.2.2/24 dev vm2
ip netns exec vm2 ip link set vm2 up
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

#set provide network
ovs-vsctl add-br nat_test
ip link set nat_test up
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=nattest:nat_test

ip netns add vm0
ovs-vsctl add-port nat_test vm0 -- set interface vm0 type=internal
ip link set vm0 netns vm0
ip netns exec vm0 ip link set vm0 address 00:00:00:00:00:01
ip netns exec vm0 ip addr add 172.16.0.100/16 dev vm0
ip netns exec vm0 ip link set vm0 up
ovs-vsctl set Interface vm0 external_ids:iface-id=vm0
ip netns exec vm1 ip route add default via 173.0.1.1
ip netns exec vm2 ip route add default via 173.0.2.1

ovn-nbctl --wait=hv sync
sleep 30
ip netns exec vm1 ping 172.16.0.102 -c 1
ip netns exec vm1 ping 172.16.0.100 -c 1

echo "after add all ls" >> test_stat
grep RSS /proc/$controller_pid/status >> test_stat
ovn-appctl -t ovn-controller lflow-cache/show-stats >> test_stat

i=100
for m in `seq 0 9`;do
  for n in `seq 1 99`;do
    ovn-nbctl lr-del r${i}
    ovs-vsctl del-port vm$i
    ovn-nbctl ls-del s${i}
    let i++
    if [ $i -gt 300 ];then
       break;
    fi 
  done
  if [ $i -gt 300 ];then
    break;
  fi
done

ovn-nbctl --wait=hv sync
sleep 60
ip netns exec vm1 ping 172.16.0.102 -c 1
ip netns exec vm1 ping 172.16.0.100 -c 1

echo "after del ls" >> test_stat
grep RSS /proc/$controller_pid/status >> test_stat

result on ovn-2021-21.09.1-24:

VmRSS:	    4628 kB
after add all ls
VmRSS:	  986720 kB
Enabled: true
high-watermark  : 201103
total           : 201103
cache-conj-id   : 0
cache-expr      : 195607
cache-matches   : 5496
trim count      : 0
Mem usage (KB)  : 247754
after del ls
VmRSS:	  986872 kB

<=== memory doesn't decrease

Enabled: true
high-watermark  : 201103
total           : 27037
cache-conj-id   : 0
cache-expr      : 25159
cache-matches   : 1878
trim count      : 0

<== trim count is 0

Mem usage (KB)  : 45089


result on ovn-2021-21.12.0-11:

VmRSS:	    4676 kB
after add all ls
VmRSS:	 1009264 kB
Enabled: true
high-watermark  : 202005
total           : 202005
cache-expr      : 196507
cache-matches   : 5498
trim count      : 1
Mem usage (KB)  : 229471
after del ls
VmRSS:	  481368 kB

<=== memory decreased

Enabled: true
high-watermark  : 27336
total           : 27336
cache-expr      : 25456
cache-matches   : 1880
trim count      : 2

<=== trim count is 2

Mem usage (KB)  : 38447


Dumitru, does the result show that the feature take effect?

Comment 16 Dumitru Ceara 2022-01-19 08:33:29 UTC

(In reply to Jianlin Shi from comment #15)
> 
> result on ovn-2021-21.09.1-24:
> 
> VmRSS:	    4628 kB
> after add all ls
> VmRSS:	  986720 kB
> Enabled: true
> high-watermark  : 201103
> total           : 201103
> cache-conj-id   : 0
> cache-expr      : 195607
> cache-matches   : 5496
> trim count      : 0
> Mem usage (KB)  : 247754
> after del ls
> VmRSS:	  986872 kB
> 
> <=== memory doesn't decrease
> 
> Enabled: true
> high-watermark  : 201103
> total           : 27037
> cache-conj-id   : 0
> cache-expr      : 25159
> cache-matches   : 1878
> trim count      : 0
> 
> <== trim count is 0
> 
> Mem usage (KB)  : 45089
> 
> 
> result on ovn-2021-21.12.0-11:
> 
> VmRSS:	    4676 kB
> after add all ls
> VmRSS:	 1009264 kB
> Enabled: true
> high-watermark  : 202005
> total           : 202005
> cache-expr      : 196507
> cache-matches   : 5498
> trim count      : 1
> Mem usage (KB)  : 229471
> after del ls
> VmRSS:	  481368 kB
> 
> <=== memory decreased
> 
> Enabled: true
> high-watermark  : 27336
> total           : 27336
> cache-expr      : 25456
> cache-matches   : 1880
> trim count      : 2
> 
> <=== trim count is 2
> 
> Mem usage (KB)  : 38447
> 
> 
> Dumitru, does the result show that the feature take effect?

Looks good to me, thanks!

Comment 17 Jianlin Shi 2022-01-20 02:40:53 UTC

set VERIFIED per comment 15 and comment 16

Comment 19 errata-xmlrpc 2022-02-24 17:47:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0674

Comment 20 Red Hat Bugzilla 2023-10-15 04:25:05 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.