Bug 1316981 - Arp table kernel tuning necessary for large neutron environments
Summary: Arp table kernel tuning necessary for large neutron environments
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates   
(Show other bugs)
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ga
: 12.0 (Pike)
Assignee: Emilien Macchi
QA Contact: Ofer Blaut
URL:
Whiteboard:
Keywords: Triaged, ZStream
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-11 15:40 UTC by David Peacock
Modified: 2018-02-05 19:02 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-heat-templates-7.0.0-0.20170628002128.el7ost.noarch
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 20:40:44 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3462 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC
Launchpad 1690087 None None None 2017-05-11 08:18 UTC

Description David Peacock 2016-03-11 15:40:54 UTC
Description of problem:

On deployments that have large neutron environment requirements, customers experience timeouts and other issues related to arp tables being exhausted due to low default Linux kernel caching values.

These problems are trivially solved with kernel tuning, but should be avoided entirely by making such tuning default during deployment of neutron networking in the overcloud.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Deploy overcloud in large network environment

Actual results:

Ping timeouts, other general routing failures.

Expected results:

Normal network functionality

Additional info:

Packet loss in such an environment, pre-tuned, resembles the following:

64 bytes from 192.168.10.2: icmp_seq=150 ttl=64 time=0.054 ms
64 bytes from 192.168.10.2: icmp_seq=151 ttl=64 time=0.066 ms
64 bytes from 192.168.10.2: icmp_seq=152 ttl=64 time=0.051 ms
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
64 bytes from 192.168.10.2: icmp_seq=158 ttl=64 time=0.972 ms
64 bytes from 192.168.10.2: icmp_seq=159 ttl=64 time=0.035 ms

This is resolved with the following tuning:

Add the following lines to /etc/sysctl.conf:

net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096

Then run:

sysctl -p

Immediate and positive effect will be observed.

The purpose of this RFE / bug request is to see about having these tuning values added; higher defaults would mean customers never encounter these errors.

Comment 2 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 3 Assaf Muller 2017-01-16 18:31:13 UTC
@Flavio, can you please sanity check this request, specifically upping:
net.ipv4.neigh.default.gc_thresh1=1024
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096

From their lower defaults?

Comment 4 Flavio Leitner 2017-03-24 12:37:11 UTC
(In reply to Assaf Muller from comment #3)
> @Flavio, can you please sanity check this request, specifically upping:
> net.ipv4.neigh.default.gc_thresh1=1024
> net.ipv4.neigh.default.gc_thresh2=2048
> net.ipv4.neigh.default.gc_thresh3=4096
> 
> From their lower defaults?

They look good to me.

Comment 10 errata-xmlrpc 2017-12-13 20:40:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.