Bug 1384746 - [DOCS] [3.4] Document tuning options for arp cache
Summary: [DOCS] [3.4] Document tuning options for arp cache
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: brice
QA Contact: Johnny Liu
Vikram Goyal
URL:
Whiteboard:
: 1566671 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-14 04:15 UTC by Vikram Goyal
Modified: 2021-09-09 11:58 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-24 05:13:20 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Vikram Goyal 2016-10-14 04:15:37 UTC
Eng card: https://trello.com/c/DZb8ghlZ/228-5-scale-document-tuning-options-for-arp-cache

@jialiu - I am not sure if you are the right contact for this. I apologize if you are not, but I couldn't figure out who the right person might be based on the Trello Card or the QE page.

Comment 1 Jiří Mencák 2016-10-20 14:58:16 UTC
Linking a thread: http://post-office.corp.redhat.com/archives/atomic-networking/2016-October/msg00046.html

Consensus seems to be
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

are sane values to use when the default settings: net.ipv6.neigh.default.gc_thresh1 = 128
net.ipv6.neigh.default.gc_thresh2 = 512
net.ipv6.neigh.default.gc_thresh3 = 1024

begin to cause issues.

Comment 2 Jeremy Harris 2016-10-28 11:06:28 UTC
Note that Documentation/networking/ip-sysctl.txt only mentions
gc_thresh1 and gc_thresh3.   Possibly a kernel component bug rather
than a documentation one?

We have a KCS (https://access.redhat.com/solutions/23454) which says
"gc_thresh1 is not used anymore and is present for backwards compatibility".
I'm unsure about that, as of d5f963fc3f "net/core/neighbour.c" line 766
is using gc_thresh1, line 280 using gc_thresh2.

Comment 3 Jiří Mencák 2016-11-16 20:14:56 UTC
I propose the following text to be added to the documentation.  As it is HAProxy specific, probably right behind "Preventing Connection Failures During Restarts"?

== ARP cache tuning for large scale deployments

In OCP deployments with large numbers of routes (greater than 
the value of ```net.ipv4.neigh.default.gc_thresh3```, which is 1024 by default),
it is necessary to increase the default values of sysctl variables
to allow more entries in the ARP cache.

=== Issue description

In such large deployments the problem manifests itself by kernel messages 
similar to the following ones:
```
[ 1738.811139] net_ratelimit: 1045 callbacks suppressed
[ 1743.823136] net_ratelimit: 293 callbacks suppressed
```

Also, when using the ```oc``` client the commands fail and the user receives the following messages:
```
Unable to connect to the server: dial tcp: lookup <hostname> on <ip>:<port>: write udp <ip>:<port>-><ip>:<port>: write: invalid argument
```

=== Resolution

To verify the actual amount of ARP entries for IPv4, you could run the following:
```
# ip -4 neigh show nud all | wc -l
```

If the number begins to approach the ```net.ipv4.neigh.default.gc_thresh3``` threshold, 
the following sysctl values are recommended for large scale deployments:

```
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536
```

To make these settings permanent across reboots it is advised to create a 
https://access.redhat.com/solutions/1305833[custom tuned profile].

Comment 4 brice 2016-11-17 06:41:39 UTC
Thanks, all.

I've submitted a PR for this. Feel free to make any comments there or here:

https://github.com/openshift/openshift-docs/pull/3242

Comment 5 openshift-github-bot 2016-11-30 05:29:09 UTC
Commit pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/70602f79bb94648a911da30d0ec4ce6dd14ab070
Merge pull request #3242 from bfallonf/haproxy_1384746

Bug 1384746 added section on tuning sysctl values

Comment 7 Ryan Howe 2018-04-12 21:55:00 UTC
*** Bug 1566671 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.