Bug 2183967

Summary: "ip -b" gets OOM-killed when adding routes with NetworkManager active
Product: Red Hat Enterprise Linux 9 Reporter: David Jaša <djasa>
Component: iprouteAssignee: Andrea Claudi <aclaudi>
Status: CLOSED MIGRATED QA Contact: Mingyu Shi <mshi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.2CC: jiji, network-qe
Target Milestone: rcKeywords: MigratedToJIRA
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-24 09:53:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bird_routes.py none

Description David Jaša 2023-04-03 09:32:34 UTC
Created attachment 1955430 [details]
bird_routes.py

Description of problem:
"ip -b" gets OOM-killed when adding routes with NetworkManager active.

Version-Release number of selected component (if applicable):
iproute-6.1.0-1.el9.x86_64

How reproducible:
frequently

Steps to Reproduce:
1. make sure that NetworkManager.service is active
2. prepare a routes file:
bird_routes.py testX6 6 $NUM > bird-routes
3. apply these commands:
ip -b bird-routes

Actual results:
on a machine with 4GB memory:
- NUM=100000 seems to pass always
- NUM between 400000 to 500000 sometimes works, sometimes OOMs
- NUM around one million or larger yields reliable OOM

OOM:
Mar 30 10:21:16 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/session-3.scope,task=ip,pid=26861,uid=0
Mar 30 10:21:16 kernel: Out of memory: Killed process 26861 (ip) total-vm:26388012kB, anon-rss:3335200kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:51676kB oom_score_adj:0

Expected results:
ip does keeps reasonable memory usage

Additional info:
* similar behaviour is for IPv4 as well however large NUM is needed to make it visible
* generating script is as used in NetworkManager CI: https://gitlab.freedesktop.org/NetworkManager/NetworkManager-ci/-/blob/main/prepare/bird_routes.py
* we created this scenario as a verification of bug 1861527 to mimick environment of BGP routers (where 2M of routes per interface are nothing exceptional). Over time, the scenario started failing for unclear reasons which turned out to be this bug

Comment 1 David Jaša 2023-04-03 14:02:01 UTC
Few more Additional info bullets:
* NM itself behaves as expected. It gets flooded by the sheer number of the new routes, however it keeps its resource usage under control
* when NM is disabled, the memory issues of ip do not occur. However as we need to verify NM behaviour, we need to have this working with NM running

Comment 2 Andrea Claudi 2023-05-24 09:53:28 UTC
This issue was migrated to JIRA. You can track it on https://issues.redhat.com/browse/RHEL-522