Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1477518 - The neighbor cache should be also updated for atomic host env
The neighbor cache should be also updated for atomic host env
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.6.0
All All
medium Severity medium
: ---
: 3.7.0
Assigned To: jmencak
Johnny Liu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-02 06:09 EDT by zhaozhanqi
Modified: 2017-11-28 17:06 EST (History)
7 users (show)

See Also:
Fixed In Version: openshift-ansible-3.7.0-0.126.4
Doc Type: Known Issue
Doc Text:
Never implemented in the installer for Atomic Host. Relevant docs merged: https://github.com/openshift/openshift-docs/pull/5115
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 17:06:30 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-28 21:34:54 EST

  None (edit)
Description zhaozhanqi 2017-08-02 06:09:55 EDT
Description of problem:

Install atomic openshift on atomic host. Check the arp cache of default on the node container:

# docker exec $node_container_id sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

The correct value should be:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

Version-Release number of selected component (if applicable):
v3.6.173.0.2

How reproducible:
always

Steps to Reproduce:
1. Install the OCP on Atomic Host
2. check the cache of on node container, like
 # docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
3.

Actual results:
docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

Expected results:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag
Comment 1 Scott Dodson 2017-08-02 08:40:42 EDT
@jmencak, how should we handle tuned profiles on atomic host? On RHEL they're installed via openshift RPMS which are largely copies of the upstream atomic profiles. Should we push this change into the upstream atomic profile or should we have the installer plop down a profile via ansible? If we go that route I'd prefer removing the profiles from the RPM packaging so that they only need to be updated in one place.

Or maybe the profile could be instantiated from within the node container?
Comment 2 jmencak 2017-08-07 06:02:41 EDT
@sdodson, pushing the change into upstream atomic profiles sounds like a hack (if not just settings duplicity) and have the installer set the profile sounds like a conflict with the "recommend" philosophy of the tuned package.  Also, this would result in a bug similar to BZ1459146 for Atomic Host too.  As for instantiating the profile from within the node container, I'm not sure what exactly do you mean.  Instantiating from a router pod, for example?

I'm all for updating tuned profiles in one place though.  How about making the profiles more aware of the OS and environment they run in (RHEL/atomic-host, virtual/bare_metal) to set the relevant parent profile and "recommend" the correct profile to use.  Something similar to https://github.com/openshift/openshift-ansible/pull/4566 and https://github.com/openshift/origin/pull/14859 with Atomic Host taking into account?

How would removing the profiles from RPM packaging help update the profiles in one place?  Where would you like to have them instead?
Comment 3 Scott Dodson 2017-08-07 08:41:10 EDT
The main question is how do we deliver the profiles to atomic host? We don't install RPMs there, we execute our ansible playbooks and when the installation is done there's a set of systemd units that run docker containers or system containers.

We could make the ansible playbooks responsible for copying the files to atomic host. They could get installed inside the node container or perhaps router container as you've suggested. Both of those are privileged containers.
Comment 6 zhaozhanqi 2017-09-11 03:05:23 EDT
When I installed atomic host env using the latest openshift-ansile. Found the bug still can be reproduced. 

BTW, Found this template https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/templates/tuned/openshift/tuned.conf is not used during install atomic host env. Could you help confirm this?
Comment 7 jmencak 2017-09-11 08:40:03 EDT
I can confirm that the current openshift-ansible no longer fixes this issue.

see https://github.com/openshift/openshift-ansible/issues/5351

The latest openshift-ansible that worked seems to be openshift-ansible-3.7.0-0.111.0-2-gf1d983527 (haven't tested, just looked at the ansible code).
Comment 12 zhaozhanqi 2017-09-17 22:20:21 EDT
verified on openshift v3.7.0-0.126.4 using latest openshift-ansible

this issue was fixed.
Comment 13 zhaozhanqi 2017-09-18 03:07:01 EDT
please ignore comment 12.

seems this bug still be reproduced when using openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm

-bash-4.2# docker exec 19b8b171d9c2 oc version
oc v3.7.0-0.126.4
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO
-bash-4.2# docker exec 19b8b171d9c2 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

here is jenkins log 
https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/21879/consoleFull
Comment 14 zhaozhanqi 2017-09-18 04:56:31 EDT
verified this bug when set the 'region=infra' during the installation.

the comment 13 is due to the node is not set the label 'region=infra' or 'region=primary' 

# cat /etc/tuned/recommend.conf 
[openshift-node]
/etc/origin/node/node-config.yaml=.*region=primary

[openshift-control-plane,master]
/etc/origin/master/master-config.yaml=.*

[openshift-control-plane,node]
/etc/origin/node/node-config.yaml=.*region=infra
Comment 18 errata-xmlrpc 2017-11-28 17:06:30 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.