Description of problem: After setup OCP 3.11 cluster. found the ARP cache did not been set on node: net.ipv4.neigh.default.gc_thresh1 = 128 net.ipv4.neigh.default.gc_thresh2 = 512 net.ipv4.neigh.default.gc_thresh3 = 1024 net.ipv6.neigh.default.gc_thresh1 = 128 net.ipv6.neigh.default.gc_thresh2 = 512 net.ipv6.neigh.default.gc_thresh3 = 1024 Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-roles-3.11.0-0.11.0.git.0.3c66516None.noarch.rpm oc v3.11.0-0.11.0 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: Steps to Reproduce: 1. Setup the 3.11 OCP cluster with openshift-ansible 2. Check the ARP cache size on nodes (except master node) sysctl -a | grep "neigh.default.gc_thresh" 3. Actual results: sysctl -a | grep "neigh.default.gc_thresh" net.ipv4.neigh.default.gc_thresh1 = 128 net.ipv4.neigh.default.gc_thresh2 = 512 net.ipv4.neigh.default.gc_thresh3 = 1024 net.ipv6.neigh.default.gc_thresh1 = 128 net.ipv6.neigh.default.gc_thresh2 = 512 net.ipv6.neigh.default.gc_thresh3 = 1024 Expected results: net.ipv4.neigh.default.gc_thresh1 = 8192 net.ipv4.neigh.default.gc_thresh2 = 32768 net.ipv4.neigh.default.gc_thresh3 = 65536 net.ipv6.neigh.default.gc_thresh1 = 8192 net.ipv6.neigh.default.gc_thresh2 = 32768 net.ipv6.neigh.default.gc_thresh3 = 65536 Additional info: when I restart 'tuned' service on the node. the values will become the correct one.
This is probably a situation where node-config.yaml does not trigger the desired configuration to be applied by tuned because at the time tuned is started it doesn't have the labels expected. node-config.yaml is updated during the bootstrap process. We need to figure out a way to trigger tuned to be restarted.
We might be able to use what Seth suggests in BZ1569917, i.e. match against BOOTSTRAP_CONFIG_NAME in /etc/sysconfig/atomic-openshift-node until tuned is shipped containerized and uses its own logic to get node labels. Will try to look into it and create a PR today/tomorrow.
Because this isn't a regression between 3.10 and 3.11, aka it's broken in 3.10 too, I'm moving this to 3.11.z as it does not meet the criteria for 3.11.0 blocker.
Addressed upstream by https://github.com/openshift/openshift-ansible/pull/10102
Verified this bug on openshift-ansible-4.0.0-0.52.0.git.0.24249e7.el7.noarch.rpm
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0024