Bug 1211133
Summary: | high cpu use with many IPv6 cloned routes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Rik van Riel <riel> |
Component: | NetworkManager | Assignee: | Thomas Haller <thaller> |
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.1 | CC: | dcbw, jklimes, thaller, vbenes |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-19 11:01:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rik van Riel
2015-04-13 06:25:47 UTC
Rik pointed out that on IRC his greps are wrong because 'cache' is on a separate line. NM is not (and should not) be duplicating the cached routes as static routes like the original summary implicates. Possibly we could have src/platform/nm-linux-platform.c::event_notification() return early when a RTM_NEWROUTE is seen for a RTM_F_CLONED route, and just skip any processing of it. I can reduce the NM CPU usage in response to this ping6-bomb script: #!/bin/bash for i in `seq 1 4000`; do ping6 -c 4 2001:470:20::${i} & done from around 80% down to 6% with this change to src/platform/nm-linux-platform.c::event_notification(): debug ("netlink event (type %d)", event); } + if ( event == RTM_NEWROUTE + && type == OBJECT_TYPE_IP6_ROUTE + && rtnl_route_get_flags ((struct rtnl_route*) object) & RTM_F_CLONED) + return NL_OK; + cache = choose_cache_by_type (platform, type); cached_object = nm_nl_cache_search (cache, object); This at least prevents the cloned routes from showing up in the NM cache. May not be the correct solution here, but since NM shouldn't really be caring about cloned routes anyway, possibly this is OK? Thomas? Scratch build here with that patch for testing: https://brewweb.devel.redhat.com/taskinfo?taskID=8986680 CPU use spikes appear to coincide with ipv6 route cache garbage collection, and subsequent reinsertion of the route cache entries. It makes me wonder if this should be fixed in the kernel, instead of in userspace. Are there any applications at all that want NETLINK_ROUTE to mean "notify me every time the ipv6 route cache table is changed", instead of the ipv4 meaning of "notify me every time a static network route changes"? Should ipv6 behaviour be changed to match ipv4 behaviour? (In reply to Rik van Riel from comment #5) > CPU use spikes appear to coincide with ipv6 route cache garbage collection, > and subsequent reinsertion of the route cache entries. > > It makes me wonder if this should be fixed in the kernel, instead of in > userspace. > > Are there any applications at all that want NETLINK_ROUTE to mean "notify me > every time the ipv6 route cache table is changed", instead of the ipv4 > meaning of "notify me every time a static network route changes"? > > Should ipv6 behaviour be changed to match ipv4 behaviour? currently, NM uses rtnl_route_alloc_cache() which expliclty requests a dump including RTM_F_CLONED, so we see them there. Also, we enable netlink events via »···nle = nl_socket_add_memberships (priv->nlh_event, »··· RTNLGRP_LINK, »··· RTNLGRP_IPV4_IFADDR, RTNLGRP_IPV6_IFADDR, »··· RTNLGRP_IPV4_ROUTE, RTNLGRP_IPV6_ROUTE, »··· 0); and there we too get RTM_F_CLONED. While this could all be improved, I think the upstream branch that completely refactors NMPlatform is the right fix. We should merge https://bugzilla.gnome.org/show_bug.cgi?id=747981 and backport that to nm-1-0/rhel-7. fixed upstream as: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=35dcd8ac33d631c54532aa3bd0b3d2e026ec6407 The new NetworkManager seems to have a lot lower CPU use on my "problem system". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-2315.html |