Bug 1221915

Summary:	IPv6 routing/neighbor table suspected memory leak
Product:	[Fedora] Fedora	Reporter:	Seb L. <D8F55524>
Component:	kernel	Assignee:	Neil Horman <nhorman>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	21	CC:	gansalmon, itamar, jonathan, kernel-maint, lnykryn, madhu.chinakonda, mchehab, nhorman, zbyszek
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-05-15 17:05:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Seb L. 2015-05-15 09:02:02 UTC

Description of problem:
The default IPv6 routing table max. size is insanely low: 4096 bytes (esp. if we compare it to the default IPv4 routing table max. size which is... 2 GiB).
This leads to network malfunctions very difficult to diagnose.

Version-Release number of selected component (if applicable):
Fedora 21/all kernels to date

How reproducible:
100% (insanely low default value in the Linux kernel)

Steps to Reproduce:
1. sysctl net.ipv6.route.max_size
This gives you the max. size (in bytes, not records!)
of the IPv6 routing table.
2. sysctl net.ipv4.route.max_size
This gives you the max. size of the IPv4 routing table.
3. Compare the values...

Actual results:
The default IPv6 routing table max. size is 4096 bytes, that is it can barely hold 27 routes (for every table: main/local/custom, even including link-local records automatically added for every network interface).

This limit is extremely easy to reach and overflow (esp. on a machine with a few physical network interfaces -- which is not uncommon for a workstation or even more on a server, not to speak of virtual network interfaces brought by tunnels or virtual machines).

This excessively low default value leads to networking malfunctions quite difficult to diagnose: sudden loss of all IPv6 network connectivity after a few hours uptime, including link-local traffic, malfunction of NDP, oscillations of the whole IPv6 routing subsystem:
http://ur1.ca/kcb3j

Expected results:
The IPv6 routing table should be increased to a decent size (at least 64 kiB for a workstation, many orders of magnitude higher for a server/router).

Additional info:

For reference, please also see:
http://ipv6.web.cern.ch/content/linux-ipv6-routing-table-maxsize-badly-dimensioned-compared-ipv4

Comment 1 Josh Boyer 2015-05-15 11:33:59 UTC

You can change the limit via the sysctl.  If you want the default limit changed, you can add a file to /etc/sysctl.d/ to set it.

If you want the distribution to increase the limit by default, then this should be assigned to initscripts so I'm doing it now.  If you want the in-kernel default changed, you need to take it upstream.

Comment 2 Lukáš Nykrýn 2015-05-15 11:50:08 UTC

If the limit is insanely low, that it is a kernel problem and the default value should be changed (of course upstream first). But I don't see a reason why this can't be handled by kernel maintainers.

Comment 3 Josh Boyer 2015-05-15 11:55:11 UTC

The kernel has runtime tunables to be used at runtime.  We are not going to carry a patch for this in the Fedora kernel, and sysctl should be used until it works itself out upstream.  This is no different than shmmax or shmall that we already set via sysctl in initscripts.

Comment 4 Lukáš Nykrýn 2015-05-15 15:13:43 UTC

(In reply to Josh Boyer from comment #3)
> The kernel has runtime tunables to be used at runtime.  We are not going to
> carry a patch for this in the Fedora kernel, and sysctl should be used until
> it works itself out upstream.  This is no different than shmmax or shmall
> that we already set via sysctl in initscripts.

Please see:
https://bugzilla.redhat.com/show_bug.cgi?id=1056547
https://git.fedorahosted.org/cgit/initscripts.git/commit/?id=032a604fe9c24483c9bc67c76494995d88115abe

And please don't return this to intiscripts, unless there is a clear reason why this can't be changed in kernel upstream.

Comment 5 Neil Horman 2015-05-15 17:05:06 UTC

Please see the documentation for these sysctls as your understanding is incorrect:
http://lxr.free-electrons.com/source/Documentation/networking/ip-sysctl.txt

ipv6.route.max_size isn't a measure of the number of bytes available to the ipv6 route cache, its a measure of the number of entries allowed in the ipv6 route cache before the garbage collector runs.  4096 entries is more than sufficient for most server workloads, which is what the defaults are tuned for.  If you intend to use Linux as a core router, for which 4096 is too small a value, tune max_size (and all the other settings that need adjusting) for such a workload.

Comment 6 Neil Horman 2015-05-15 17:58:48 UTC

Also, be aware that upstreams solution to the general problem route cache sizing is going to be removing the route cache from ipv6, just as they did with ipv4.  thats a large undertaking though, not just a simple bug fix.  Until then, the solution is going to be, bump up your cache size to fit your workload

Comment 7 Seb L. 2015-05-15 18:23:33 UTC

Dear Neil,

Only net.ipv4.route.max_size is documented on the link you gave, and all it says (for kernel >= 3.6) is that this is now deprecated (which probably explains the returned 2 Gi value).

There is no information in this document regarding net.ipv6.route.max_size.

It is however clear that in my case (observed both with kernel-3.18.8-201.fc21.x86_64 and kernel-3.19.5-200.fc21.x86_64), I did obviously hit the limit with around only 70 routes (which is *very* far from 4096!).

Symptoms at that time were: total loss of IPv6 connectivity (even link-local destinations, returning "connect: Network is unreachable"!), loss of NDP (all neighbour entries being marked as "FAILED" instead of "REACHABLE").

Restarting the network (systemctl restart network; systemctl restart NetworkManager) did not help, the only solution (for a few hours) was to reboot the machine which worked for a few hours before loosing again its IPv6 connectivity.

Sometimes, I could observe global IPv6 system routing instability: some IPv6 destinations started to be reachable for a few seconds, then unreachable, then again reachable, etc., see e.g.:
  http://paste.fedoraproject.org/220685/13553921/

Trying to manually add any IPv6 route instantly led to the following error:
  RTNETLINK answers: Cannot allocate memory

Again, at that time I got less than 80 entries (lines) in /proc/net/ipv6_route.

All of this on a workstation (not a server, and certainly not a core router) connected to basically three networks (one LAN, one test network and an IPv6 tunnel) and less than 80 lines in /proc/net/ipv6_route.

Increasing net.ipv6.route.max_size to a higher value (64 Ki instead of 4 Ki) instantly restored all IPv6 connectivity and functionality.

Moreover, the PowerDNS issue described in this link is extremely similar to the issue I got:
  http://marc.info/?l=linux-netdev&m=139352943109400&w=2

  "Note, this system is not functioning as a router or anything.
   It is just serving IPv6 DNS to a reasonable number of clients."

See also:
  http://ipv6.web.cern.ch/content/linux-ipv6-routing-table-maxsize-badly-dimensioned-compared-ipv4


From there, there are two possibilities:

- either net.ipv6.route.max_size is actually supposed to set the maximum number of IPv6 route entries (i.e. the number of lines in /proc/net/ipv6_route) before GC, and then there is clearly a bug in the kernel since I did not even reach 2% of this value when I triggered the "Cannot allocate memory" error;

- either net.ipv6.route.max_size is counting something else (IPv6 routing table in bytes or whatever) and, as a matter of fact this limit could be reached with less than 80 routes (most of them being automatically generated in the "local" table, not in "main") which is *really* low and could easily be reached even on a simple development workstation.

I would be happy to run any test you could think of on my workstation in order to try to understand why I hit the net.ipv6.route.max_size with less than 80 IPv6 routes.


Best regards,
Sébastien

Comment 8 Seb L. 2015-05-21 20:52:12 UTC

Hi,

New elements there regarding this bug (which is definitively a bug, but actually a routing memory leak and not a bad default parameter as I first thought).

So to summarize, on my Fedora 21 workstation (with IPv6 connectivity), after a dozen hours of uptime, I was always loosing all IPv6 connectivity because adding any route would lead to: "RTNETLINK answers: Cannot allocate memory".

I then increased the net.ipv6.route.max_size from 4096 to 65536 (x16).

Now (and this is the news), after 8 days of uptime (so 16x a dozen hours), I got exactly the same issue, with still exactly the same numbers of lines in my /proc/net/ipv6_route, and I have a few hundreds of neighbors on my LAN.

I now increased this value from 65536 to 131072, and I bet in 8 more days, I will again run into the same issue.

Could I run any command in order to check the exact memory allocated by the IPv6 routing/neighbor table and how to diagnose what really looks like a kernel memory leak?

This Ubuntu bug report might or might not be related to this bug:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1065159
(note that on my machine, proxy_ndp is disabled and I don't have any bridge but a couple of tunnels and sometimes a few virtual machines/docker containers)

Comment 9 Seb L. 2015-05-29 13:35:14 UTC

Indeed, 8 days later, the machine again ran out of memory (RTNETLINK answers: Cannot allocate memory). As usual, no change in the (static) routing configuration of the machine.

Increased again by a factor 2 (now 262144, that is 64 times its original size), and everything works again.

Obviously a kernel memory leak.