Bug 1861527
| Summary: | Excessive memory and CPU usage on a router with IPv6 BGP feed. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Tomasz Kepczynski <tomek> | |
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | |
| Status: | CLOSED ERRATA | QA Contact: | David Jaša <djasa> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 8.2 | CC: | acardace, bgalvani, ferferna, fge, fpokryvk, lrintel, rkhan, sukulkar, support, thaller, till, vbenes, wenliang | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 8.6 | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2063175 (view as bug list) | Environment: | ||
| Last Closed: | 2022-05-10 14:54:06 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2063175 | |||
|
Description
Tomasz Kepczynski
2020-07-28 20:56:10 UTC
This is a known problem (and of course needs fixing).
> For me - it CLEARLY disqualifies NetworkManager as ONLY solution to manage interfaces on a system. Is there any alternative beyond deprecated network-scripts which cannot handle some of the tunnel types (gretap) I need?
Right. There are certain use-cases where NetworkManager is badly suited today. The solution for that will be to fix NetworkManager for those use-cases.
The workaround until then is indeed network-scripts (or a script of your choice to configure your interfaces).
Sorry if that is disappointing and thank you for the elaborate report.
It's a pity these limitations are not documented. I've put some effort to transition to NetworkManager only to learn I would have better spent that time elsewhere. Would it be possible to document what setups are better avoided with NetworkManager (like in known issues section of RHEL release notes)? Yes, that should be better documented. Release notes don't seem the right place, as those usually talk about changes. Nothing changed here. What is your BGP setup? Are you using quagga? Maybe this should be documented alongside the BGP documentation. Did you consult a particular documentation where you would have expect to be notified about this problem? In practice, the most relevant use-case where NetworkManager is unsuited is if you have a large number of IP routes, IP addresses, or other netlink objects. What exactly "large" means is unclear, but at some at some point performance/overhead becomes a problem. Of course, that is a problem on a BGP router. BGP is probably the most prominent case with this problem. Well, I basically didn't know until I've hit it. A few searches on google didn't reveal anything above cryptic "unsuitable for some use cases" at first and I think I've only only hit one result which specifically mentioned BGP router (and I wasn't even sure how relevant it still is) only after I realized it may be BGP which triggers the issue and added it to the search. I am using bird but I guess quagga/frr would trigger similar issues. I think I only noticed it when I added SECOND tunnel. The first one was giving limited number of routes (below 30000) and I don't recall noticing any troubles. When I added the second tunnel and the total number of routes jumped to just over 90000 it clearly became a problem. > boholt:~> ip -6 route | wc -l
> 90769
How do those routes look like? Do they have a special "protocol" value that can be used to distinguish them from regular routes?
If so, we can probably add a bpf filter to the netlink socket so that they are not passed back to user space.
I would be cautious with EXCLUDING routes this way. Bird uses 'proto bird' which is visible in the listing:
boholt:~> ip -6 route show root fd00::/7
fd00:114:514::/48 proto bird src fddd:fdef:2ea1::1 metric 32 pref medium
nexthop via fe80::1588 dev dn42_chrismoos weight 1
nexthop via fe80::ade0 dev dn42_kioubit weight 1
fd00:191e:1470::/48 proto bird src fddd:fdef:2ea1::1 metric 32 pref medium
nexthop via fe80::1588 dev dn42_chrismoos weight 1
nexthop via fe80::ade0 dev dn42_kioubit weight 1
fd00:1926:817::/48 via fe80::ade0 dev dn42_kioubit proto bird src fddd:fdef:2ea1::1 metric 32 pref medium
fd00:1953:615::/48 proto bird src fddd:fdef:2ea1::1 metric 32 pref medium
nexthop via fe80::1588 dev dn42_chrismoos weight 1
nexthop via fe80::ade0 dev dn42_kioubit weight 1
[--cut--]
but you'll have to include and maintain at least a few more exclusions judging from /etc/iproute2/rt_protos file (I can see at least zebra and gated causing similar problem). Allowing only the protocols Network Manager needs (does it?) to see would be a better solution in my opinion.
I am also not sure how ECMP routes are presented to the application and if they carry the protocol indication (which is not visible next to them).
Currently I am on network-scripts and systemd-networkd mix but it also comes with its own challenges (where is pppoe support for systemd-networkd?). Anyway - it performs much better than NetworkManager.
(In reply to Tomasz Kepczynski from comment #9) > but you'll have to include and maintain at least a few more exclusions > judging from /etc/iproute2/rt_protos file (I can see at least zebra and > gated causing similar problem). Allowing only the protocols Network Manager > needs (does it?) to see would be a better solution in my opinion. Routing protocol values are defined in /usr/include/linux/rtnetlink.h: /* rtm_protocol */ #define RTPROT_UNSPEC 0 #define RTPROT_REDIRECT 1 /* Route installed by ICMP redirects; not used by current IPv4 */ #define RTPROT_KERNEL 2 /* Route installed by kernel */ #define RTPROT_BOOT 3 /* Route installed during boot */ #define RTPROT_STATIC 4 /* Route installed by administrator */ /* Values of protocol >= RTPROT_STATIC are not interpreted by kernel; they are just passed from user and back as is. It will be used by hypothetical multiple routing daemons. Note that protocol values should be standardized in order to avoid conflicts. */ #define RTPROT_GATED 8 /* Apparently, GateD */ #define RTPROT_RA 9 /* RDISC/ND router advertisements */ #define RTPROT_MRT 10 /* Merit MRT */ #define RTPROT_ZEBRA 11 /* Zebra */ #define RTPROT_BIRD 12 /* BIRD */ #define RTPROT_DNROUTED 13 /* DECnet routing daemon */ #define RTPROT_XORP 14 /* XORP */ #define RTPROT_NTK 15 /* Netsukuku */ #define RTPROT_DHCP 16 /* DHCP client */ #define RTPROT_MROUTED 17 /* Multicast daemon */ #define RTPROT_KEEPALIVED 18 /* Keepalived daemon */ #define RTPROT_BABEL 42 /* Babel daemon */ #define RTPROT_OPENR 99 /* Open Routing (Open/R) Routes */ #define RTPROT_BGP 186 /* BGP Routes */ #define RTPROT_ISIS 187 /* ISIS Routes */ #define RTPROT_OSPF 188 /* OSPF Routes */ #define RTPROT_RIP 189 /* RIP Routes */ #define RTPROT_EIGRP 192 /* EIGRP Routes */ -- Of those, NM currently uses values <= 4, RTPROT_RA (9) and RTPROT_DHCP (16). I think it should ignore all the other protocol values. > I am also not sure how ECMP routes are presented to the application and if > they carry the protocol indication (which is not visible next to them). From what I see, as long as the next hops are added with the right protocol, as in: ip route append dev enp1s0 default via fe80::1240 proto bird or the route itself has the protocol specified: ip route add default proto bird \ nexthop via fe80::1 dev enp1s0 \ nexthop via fe80::2 dev enp1s0 \ nexthop via fe80::3 dev enp1s0 then NM will ignore them, as each next-hop is presented over netlink as an individual route with its own protocol. At least zebra and bird can also install routes learned by neighbor discovery protocol. I think they are installed with their respective protocol ids, not RTPROT_RA. I'm not sure what side effect (beyond not showing them in 'nmcli conn show' output) dropping those from reaching NetworkManager can have. (In reply to Tomasz Kepczynski from comment #15) > At least zebra and bird can also install routes learned by neighbor > discovery protocol. I think they are installed with their respective > protocol ids, not RTPROT_RA. I'm not sure what side effect (beyond not > showing them in 'nmcli conn show' output) dropping those from reaching > NetworkManager can have. NM doesn't do anything with routes installed by other tools, so this shouldn't be a problem. I prepared a patch to add a BPF filter that drops all undesired routes from the netlink socket: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/b8507db5f48a23d555041f6d2a5763cc55d6eebd I tested it by adding 2 million routes ("ip route add dev eth0 $addr/32 proto bird") and it seems to work well. Do you know how I can reproduce your BGP scenario more accurately? In the bug description you talk about a IPv6 BGP feed; is that something that can be configured easily on any machine? (sorry, I'm not familiar with BGP). I can share my bird config but you need a peer and own ASN (autonomous system number). I am taking three live BGP feeds over three tunnels, this requires some effort. And I think the important part is not necessarily the number of routes but the number of route updates which trigger NetworkManager activity. I am pretty sure Redhat and/or IBM do take full BGP feed from the Internet and can expose that feed to you. The question is how willing is your IT department to do that. If you can build the package against RHEL 8.5 and put it somewhere (maybe on https://copr.fedorainfracloud.org/?) I can try to test it. I did a scratch build for RHEL 8.5 with the patch and put the RPMs here: http://people.redhat.com/~bgalvani/NM/rh1861527/ You can download them and do a "rpm -Fvh *rpm" to upgrade. I managed once to hit a similar condition but at that time, it was kernel what took all the memory (at 1.25 GB VM) during batch addition of the routes. Since I didn't manage to repeat it, let's put this bug to Verified:Tested. I've enabled NetworkManager overnight on one of affected systems and it got killed by OOM killer a few times in less then 24 hours. These were the following packages from Alma 8.5: - NetworkManager-libnm-1.32.10-4.el8.x86_64 - NetworkManager-1.32.10-4.el8.x86_64 I am upgrading them to: - NetworkManager-1.32.10-4.1.rh1861527.el8_5.x86_64 - NetworkManager-libnm-1.32.10-4.1.rh1861527.el8_5.x86_64 and will let them run for a couple of days and see what happens. If they survive - I'll try to migrate network-scripts and systemd-networkd configuration to them and see how it works. I have it running for over 24 hours and it behaves reasonably well. I have nearly 140000 IPv6 routes from one Internet BGP feed and a few hundred of BGP routes from DN42 experimental network from a couple of peers. NetworkManager keeps its apetite for memory under control: coen:~> ps -C NetworkManager u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 743 0.0 1.1 382460 11516 ? Ssl gru25 0:16 /usr/sbin/Net This is the same system where distro provided NetworkManager was killed a couple of times in less than 24 hrs. originally, the proposed solution was a BPF filter. But that had problems and got reverted ([1]). the solution is now to do the same filtering entirely in user-space ([2]) According to my tests, the performance seems still good also when doing it in user-space. @Tomasz, it would be very interesting, if you could retest. Sorry about the effort. Would you wish for a rhel-8.5 scratch build? There are also the copr builds at https://copr.fedorainfracloud.org/coprs/networkmanager/NetworkManager-main/builds/ . [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/c37a21ea2906aaafec82b76f4ae5d1aae81ac057 [2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/3416b00988cbaa53ce6edd81762ff3cdd7b1a1a5 I've updated to: coen:~# rpm -qa NetworkManager\* NetworkManager-libnm-1.35.5-29598.copr.26c43e4bcc.el8.x86_64 NetworkManager-1.35.5-29598.copr.26c43e4bcc.el8.x86_64 and will let it run for some time. 26 minutes after boot it looks reasonable: coen:~# ps -C NetworkManager u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 722 0.4 1.6 381836 16368 ? Ssl 15:08 0:07 /usr/sbin/NetworkManager --no-daemon coen:~# uptime 15:34:51 up 26 min, 1 user, load average: 1,52, 1,15, 0,78 (In reply to Tomasz Kepczynski from comment #27) > I've updated to: > > coen:~# rpm -qa NetworkManager\* > NetworkManager-libnm-1.35.5-29598.copr.26c43e4bcc.el8.x86_64 > NetworkManager-1.35.5-29598.copr.26c43e4bcc.el8.x86_64 thank you for your effort!! That's very useful. I have a question. As I am not familiar with running a real BGP software, how busy is this system? You write there are 90k routes (which is not small, but not huge). But how many routes are added/removed per minute? Is `ip monitor route` busy (during normal operation, not during startup)? Ultimately the question is, how much CPU time does NetworkManager "waste" in e.g. one hour? I've run: ip -ts monitor route | tee routes.log for 2 minutes and it resulted in 1293 entries in the log file. This is on the system with 1 IPv6 BGP feed from the Internet, 9 BGP feeds from https://dn42.net and 2 OSPFv3 neighbors. So far NetworkManager seems to behave very reasonably: coen:~> ps -C NetworkManager u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 722 0.3 0.8 382036 7940 ? Ssl 15:08 1:14 /usr/sbin/NetworkManager --no-daemon This is nothing compared to what I originally reported. I have to mention one thing: I haven't moved to NetworkManager completely, it seems to be missing some wireguard functionality still. But this doesn't seem to be relevant to this case, as mentioned in comment #23, the distro provided NetworkManager got killed a few times with the same configuration. > for 2 minutes and it resulted in 1293 entries in the log file.
OK, that is rather busy (10/sec). It will mean, NetworkManager will constantly wake up and waste CPU time, which is a problem.
Thanks!!
Yes, and for the normal BGP case one would expect at least two BGP feeds which would likely double that number. In my case it is single feed because it is an upstream to my other site (and one of the few). Circling back to current NetworkManager usage: coen:~> ps -C NetworkManager u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 722 0.3 0.6 382384 6936 ? Ssl sty24 4:27 /usr/sbin/NetworkManager --no-daemon and I am surprised to see RSS to go down since the restart. But that's good. I don't think we need to wait for 24 hrs to elapse, it looks good. "TIME 4:27" (this is the total CPU time, right? 4 minutes). When I read this correctly, you had process 722 running for more than 16 hours at that point (which would be 0.4% of one CPU). Which is still a problem. Guess the only solution is an improved BPF filter. Thanks. According to 'man ps' TIME is 'accumulated cpu time, user + system'. %CPU seems to be what you are after. I cannot say if 0.3/0.4% is too much or not. We can observe this problem also with IPv4 entries if there are a lot of them. The ipv6 test is failing with 2000000 routes, it takes 65 minutes. With 400000 it is few seconds, 500000 is nearly a minute. Also, NM seems to consume 100% of one CPU core, while `ip -b` only 10-20% CPU core (on 4 core machine). So, I mark this as failedQA, even though ipv4 version works fine. I have lowered ipv6 limit to 500000, which might be good indicator of fix (if it takes few seconds it is fine, if around minute it is too slow). (In reply to Filip Pokryvka from comment #35) > The ipv6 test is failing with 2000000 routes, it takes 65 minutes. With > 400000 it is few seconds, 500000 is nearly a minute. Also, NM seems to > consume 100% of one CPU core, while `ip -b` only 10-20% CPU core (on 4 core > machine). So, I mark this as failedQA, even though ipv4 version works fine. > > I have lowered ipv6 limit to 500000, which might be good indicator of fix > (if it takes few seconds it is fine, if around minute it is too slow). Verifying as IPv4 seems fixed and performance improved for 400 000 IPv6 routes it is bearable (performs like 2 000 000 IPv4 routes), cloning this to RHEL 8.7 to investigate IPv6 part and possibly improve to handle 2 000 000 IPv6 routes. It is still notably faster when adding routes with NetworkManager stopped, that might be a race between `ip -b` and `NetworkManager` processes for some resources (the same kernel memory or interface), and NetworkManager uses even more CPU time than ip process, which is interesting. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1985 |