Bug 1498213

Summary: Increase ARP cache size on loadbalancers
Product: OpenShift Container Platform Reporter: Jiří Mencák <jmencak>
Component: InstallerAssignee: Jiří Mencák <jmencak>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: aos-bugs, jeder, jokerman, mmccomas
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-37
Fixed In Version: Doc Type: Known Issue
Doc Text:
This is a known issue as openshift tuned profiles were never set on RHEL Atomic Host, only RHEL.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:14:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jiří Mencák 2017-10-03 18:21:16 UTC
Description of problem:
On RHEL Atomic Host the ARP garbage collection thresholds are too low causing problems with OCP HA deployments with 1k+ nodes.

Version-Release number of selected component (if applicable):
All

How reproducible:
Always

Steps to Reproduce:
1. Install RHEL Atomic Host OCP HA cluster with a loadbalancer and query sysctl values for net.ipv[46].neigh.default.gc_thresh[1-3]

Actual results:
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024
net.ipv6.neigh.default.gc_thresh1 = 128
net.ipv6.neigh.default.gc_thresh2 = 512
net.ipv6.neigh.default.gc_thresh3 = 1024

Expected results:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536
net.ipv6.neigh.default.gc_thresh1 = 8192
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 65536

Additional info:
https://github.com/openshift/openshift-ansible/pull/5645

Comment 1 Johnny Liu 2017-10-13 05:41:43 UTC
Verified this bug with openshift-ansible-3.7.0-0.148.0.git.0.b35eb14.el7.noarch, and PASS.

After installation, go to check:
on LB host:
# sysctl -a |grep "neigh.default.gc_thresh"
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536
net.ipv6.neigh.default.gc_thresh1 = 8192
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 65536


On node host:
#  sysctl -a |grep "neigh.default.gc_thresh"
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536
net.ipv6.neigh.default.gc_thresh1 = 8192
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 65536

Comment 4 errata-xmlrpc 2017-11-28 22:14:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188