Red Hat Bugzilla – Bug 1283676
net.core.netdev_max_backlog not set high enough
Last modified: 2016-06-23 14:18:55 EDT
Description of problem:
/proc/sys/net/core/netdev_max_backlog default setting of 1000 is extremely low/unsuitable for medium to large cloud environments.
What happens is that things just stop working once you reach over
1,000 dhcp-agent ports in use, very common for medium to large clouds.
We first saw this on Trystack and thought it was a dnsmasq bug
until this bugzilla was brought to our attention:
We've set net.core.netdev_max_backlog to around 100k which has solved
Neutron grinding to a halt, a backlog of dhcp-agent/L3 churn and the
system being brought down because of backlog queues.
I understand that setting sysctl settings from an installer
perspective might seem a little invasive, but it's something that I
think would trip up large customers and deployments as it's not immediately apparent this might be the cause.
On a recent RHOS-D OSP7 installation I don't see this default changed so figure
I'd bring it to light here.
[root@overcloud-controller-0 ~]# cat
Whether or not the installer is the best place to set this I am not sure.
Version-Release number of selected component (if applicable):
RHEL-OSP7 (or anything with Kilo)
Steps to Reproduce:
1. Deploy RHEL-OSP7
2. Create over 1,000 Neutron networks with gateways set (using dhcp-agent)
3. Note things grinding to a halt.
Will can you provide an appropriate default value?
@hughbrock we use 100k for /proc/sys/net/core/netdev_max_backlog setting, best way to set this is probably a tuned profile.
*** Bug 1299080 has been marked as a duplicate of this bug. ***
From the tuning guide it looks like this is number should be a factor of the CPU capabilities (both the number of cores and speed) and the NIC capabilities (again both the number of links and speed).
It is described as a queue within the Linux kernel where traffic is stored after reception from the NIC, but before processing by the protocol stacks.
The /proc/net/softnet_stat file contains a counter in the 2nd column that is incremented when the netdev backlog queue overflows. If this value is incrementing over time, then netdev_max_backlog needs to be increased.
The guide suggests to double it and try again until no overflows are observed. I will set a value which is 10x the default in hiera so that it can be customized further if necessary.