Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1425388 - Increase the ARP cache size in the atomic-openshift-master and atomic-openshift-node tuned profiles
Increase the ARP cache size in the atomic-openshift-master and atomic-openshi...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.5.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Phil Cameron
Meng Bo
aos-scalability-35
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-02-21 05:18 EST by jmencak
Modified: 2017-08-16 15 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: new default values for arp cache (docs PR 3803 Reason: cluster fails with gt 1024 routes Result: problem does not occur
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 01:18:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 13034 None None None 2017-02-21 14:27 EST
Red Hat Product Errata RHEA-2017:1716 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 05:02:50 EDT

  None (edit)
Description jmencak 2017-02-21 05:18:59 EST
Description of problem:

In OCP clusters with large numbers of routes (greater than the value of net.ipv4.neigh.default.gc_thresh3, which is 1024 by default) the ARP cache is not large enough to accommodate for all the entries needed by the nodes running the router pods.  While this has been documented here: 

https://docs.openshift.com/container-platform/3.4/install_config/router/default_haproxy_router.html#deploy-router-arp-cach-tuning-for-large-scale-clusters

I believe this should be the default in the atomic-openshift-master and atomic-openshift-node tuned profiles.

Version-Release number of selected component (if applicable):
All


How reproducible:
Always


Steps to Reproduce:
1. Create an OCP environment with around 1024 routes (I've personally started noticing problems already at around 900 routes).

Actual results:
1) Kernel messages:
[ 1738.811139] net_ratelimit: 1045 callbacks suppressed
[ 1743.823136] net_ratelimit: 293 callbacks suppressed

2) oc client and networking in general stops working properly.

Expected results:
None of the issues in "Actual results".

Additional info:
http://post-office.corp.redhat.com/archives/atomic-networking/2016-November/msg00082.html
Comment 6 openshift-github-bot 2017-02-22 11:52:12 EST
Commit pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/59be5f894be526396d8b160adccc4481f489f765
Change default arp cache size on nodes

In OCP clusters with large numbers of routes (greater than the value of
net.ipv4.neigh.default.gc_thresh3, which is 1024 by default) the ARP
cache is not large enough to accommodate for all the entries needed by
the nodes running the router pods.

This change increases the cache size.

bug 1425388
https://bugzilla.redhat.com/show_bug.cgi?id=1425388

Signed-off-by: Phil Cameron <pcameron@redhat.com>
Comment 7 openshift-github-bot 2017-02-23 08:08:57 EST
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/ba842078f3bba0282d62a2c9db70ca4d9339e733
Change default arp cache size on nodes

In OCP clusters with large numbers of routes (greater than the value of
net.ipv4.neigh.default.gc_thresh3, which is 1024 by default) the ARP
cache is not large enough to accommodate for all the entries needed by
the nodes running the router pods.

This change increases the cache size.

bug 1425388
https://bugzilla.redhat.com/show_bug.cgi?id=1425388

Signed-off-by: Phil Cameron <pcameron@redhat.com>
Comment 9 Troy Dawson 2017-04-11 17:00:37 EDT
This has been merged into ocp and is in OCP v3.6.27 or newer.
Comment 13 errata-xmlrpc 2017-08-10 01:18:47 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.