Description of problem: Install atomic openshift on atomic host. Check the arp cache of default on the node container: # docker exec $node_container_id sysctl -a | grep net.ipv4.neigh.default.gc_thresh net.ipv4.neigh.default.gc_thresh1 = 128 net.ipv4.neigh.default.gc_thresh2 = 512 net.ipv4.neigh.default.gc_thresh3 = 1024 The correct value should be: net.ipv4.neigh.default.gc_thresh1 = 8192 net.ipv4.neigh.default.gc_thresh2 = 32768 net.ipv4.neigh.default.gc_thresh3 = 65536 Version-Release number of selected component (if applicable): v3.6.173.0.2 How reproducible: always Steps to Reproduce: 1. Install the OCP on Atomic Host 2. check the cache of on node container, like # docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh 3. Actual results: docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh net.ipv4.neigh.default.gc_thresh1 = 128 net.ipv4.neigh.default.gc_thresh2 = 512 net.ipv4.neigh.default.gc_thresh3 = 1024 Expected results: net.ipv4.neigh.default.gc_thresh1 = 8192 net.ipv4.neigh.default.gc_thresh2 = 32768 net.ipv4.neigh.default.gc_thresh3 = 65536 Additional info: Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
@jmencak, how should we handle tuned profiles on atomic host? On RHEL they're installed via openshift RPMS which are largely copies of the upstream atomic profiles. Should we push this change into the upstream atomic profile or should we have the installer plop down a profile via ansible? If we go that route I'd prefer removing the profiles from the RPM packaging so that they only need to be updated in one place. Or maybe the profile could be instantiated from within the node container?
@sdodson, pushing the change into upstream atomic profiles sounds like a hack (if not just settings duplicity) and have the installer set the profile sounds like a conflict with the "recommend" philosophy of the tuned package. Also, this would result in a bug similar to BZ1459146 for Atomic Host too. As for instantiating the profile from within the node container, I'm not sure what exactly do you mean. Instantiating from a router pod, for example? I'm all for updating tuned profiles in one place though. How about making the profiles more aware of the OS and environment they run in (RHEL/atomic-host, virtual/bare_metal) to set the relevant parent profile and "recommend" the correct profile to use. Something similar to https://github.com/openshift/openshift-ansible/pull/4566 and https://github.com/openshift/origin/pull/14859 with Atomic Host taking into account? How would removing the profiles from RPM packaging help update the profiles in one place? Where would you like to have them instead?
The main question is how do we deliver the profiles to atomic host? We don't install RPMs there, we execute our ansible playbooks and when the installation is done there's a set of systemd units that run docker containers or system containers. We could make the ansible playbooks responsible for copying the files to atomic host. They could get installed inside the node container or perhaps router container as you've suggested. Both of those are privileged containers.
When I installed atomic host env using the latest openshift-ansile. Found the bug still can be reproduced. BTW, Found this template https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/templates/tuned/openshift/tuned.conf is not used during install atomic host env. Could you help confirm this?
I can confirm that the current openshift-ansible no longer fixes this issue. see https://github.com/openshift/openshift-ansible/issues/5351 The latest openshift-ansible that worked seems to be openshift-ansible-3.7.0-0.111.0-2-gf1d983527 (haven't tested, just looked at the ansible code).
verified on openshift v3.7.0-0.126.4 using latest openshift-ansible this issue was fixed.
please ignore comment 12. seems this bug still be reproduced when using openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm -bash-4.2# docker exec 19b8b171d9c2 oc version oc v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd features: Basic-Auth GSSAPI Kerberos SPNEGO -bash-4.2# docker exec 19b8b171d9c2 sysctl -a | grep net.ipv4.neigh.default.gc_thresh net.ipv4.neigh.default.gc_thresh1 = 128 net.ipv4.neigh.default.gc_thresh2 = 512 net.ipv4.neigh.default.gc_thresh3 = 1024 here is jenkins log https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/21879/consoleFull
verified this bug when set the 'region=infra' during the installation. the comment 13 is due to the node is not set the label 'region=infra' or 'region=primary' # cat /etc/tuned/recommend.conf [openshift-node] /etc/origin/node/node-config.yaml=.*region=primary [openshift-control-plane,master] /etc/origin/master/master-config.yaml=.* [openshift-control-plane,node] /etc/origin/node/node-config.yaml=.*region=infra
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188