Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1316981

Summary: Arp table kernel tuning necessary for large neutron environments
Product: Red Hat OpenStack Reporter: David Peacock <dpeacock>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: amuller, dbecker, fleitner, jcoufal, mburns, morazi, rhel-osp-director-maint, tvignaud
Target Milestone: gaKeywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170628002128.el7ost.noarch Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 20:40:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Peacock 2016-03-11 15:40:54 UTC
Description of problem:

On deployments that have large neutron environment requirements, customers experience timeouts and other issues related to arp tables being exhausted due to low default Linux kernel caching values.

These problems are trivially solved with kernel tuning, but should be avoided entirely by making such tuning default during deployment of neutron networking in the overcloud.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Deploy overcloud in large network environment

Actual results:

Ping timeouts, other general routing failures.

Expected results:

Normal network functionality

Additional info:

Packet loss in such an environment, pre-tuned, resembles the following:

64 bytes from 192.168.10.2: icmp_seq=150 ttl=64 time=0.054 ms
64 bytes from 192.168.10.2: icmp_seq=151 ttl=64 time=0.066 ms
64 bytes from 192.168.10.2: icmp_seq=152 ttl=64 time=0.051 ms
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
64 bytes from 192.168.10.2: icmp_seq=158 ttl=64 time=0.972 ms
64 bytes from 192.168.10.2: icmp_seq=159 ttl=64 time=0.035 ms

This is resolved with the following tuning:

Add the following lines to /etc/sysctl.conf:

net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096

Then run:

sysctl -p

Immediate and positive effect will be observed.

The purpose of this RFE / bug request is to see about having these tuning values added; higher defaults would mean customers never encounter these errors.

Comment 2 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 3 Assaf Muller 2017-01-16 18:31:13 UTC
@Flavio, can you please sanity check this request, specifically upping:
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096

From their lower defaults?

Comment 4 Flavio Leitner 2017-03-24 12:37:11 UTC
(In reply to Assaf Muller from comment #3)
> @Flavio, can you please sanity check this request, specifically upping:
> net.ipv4.neigh.default.gc_thresh1=1024
> net.ipv4.neigh.default.gc_thresh2=2048
> net.ipv4.neigh.default.gc_thresh3=4096
> 
> From their lower defaults?

They look good to me.

Comment 10 errata-xmlrpc 2017-12-13 20:40:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462