Bug 1477518 - The neighbor cache should be also updated for atomic host env
The neighbor cache should be also updated for atomic host env
Status: VERIFIED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.6.0
All All
medium Severity medium
: ---
: 3.7.0
Assigned To: jmencak
Johnny Liu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-02 06:09 EDT by zhaozhanqi
Modified: 2017-09-18 05:23 EDT (History)
7 users (show)

See Also:
Fixed In Version: openshift-ansible-3.7.0-0.126.4
Doc Type: Known Issue
Doc Text:
Never implemented in the installer for Atomic Host. Relevant docs merged: https://github.com/openshift/openshift-docs/pull/5115
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description zhaozhanqi 2017-08-02 06:09:55 EDT
Description of problem:

Install atomic openshift on atomic host. Check the arp cache of default on the node container:

# docker exec $node_container_id sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

The correct value should be:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

Version-Release number of selected component (if applicable):
v3.6.173.0.2

How reproducible:
always

Steps to Reproduce:
1. Install the OCP on Atomic Host
2. check the cache of on node container, like
 # docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
3.

Actual results:
docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

Expected results:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag
Comment 1 Scott Dodson 2017-08-02 08:40:42 EDT
@jmencak, how should we handle tuned profiles on atomic host? On RHEL they're installed via openshift RPMS which are largely copies of the upstream atomic profiles. Should we push this change into the upstream atomic profile or should we have the installer plop down a profile via ansible? If we go that route I'd prefer removing the profiles from the RPM packaging so that they only need to be updated in one place.

Or maybe the profile could be instantiated from within the node container?
Comment 2 jmencak 2017-08-07 06:02:41 EDT
@sdodson, pushing the change into upstream atomic profiles sounds like a hack (if not just settings duplicity) and have the installer set the profile sounds like a conflict with the "recommend" philosophy of the tuned package.  Also, this would result in a bug similar to BZ1459146 for Atomic Host too.  As for instantiating the profile from within the node container, I'm not sure what exactly do you mean.  Instantiating from a router pod, for example?

I'm all for updating tuned profiles in one place though.  How about making the profiles more aware of the OS and environment they run in (RHEL/atomic-host, virtual/bare_metal) to set the relevant parent profile and "recommend" the correct profile to use.  Something similar to https://github.com/openshift/openshift-ansible/pull/4566 and https://github.com/openshift/origin/pull/14859 with Atomic Host taking into account?

How would removing the profiles from RPM packaging help update the profiles in one place?  Where would you like to have them instead?
Comment 3 Scott Dodson 2017-08-07 08:41:10 EDT
The main question is how do we deliver the profiles to atomic host? We don't install RPMs there, we execute our ansible playbooks and when the installation is done there's a set of systemd units that run docker containers or system containers.

We could make the ansible playbooks responsible for copying the files to atomic host. They could get installed inside the node container or perhaps router container as you've suggested. Both of those are privileged containers.
Comment 6 zhaozhanqi 2017-09-11 03:05:23 EDT
When I installed atomic host env using the latest openshift-ansile. Found the bug still can be reproduced. 

BTW, Found this template https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/templates/tuned/openshift/tuned.conf is not used during install atomic host env. Could you help confirm this?
Comment 7 jmencak 2017-09-11 08:40:03 EDT
I can confirm that the current openshift-ansible no longer fixes this issue.

see https://github.com/openshift/openshift-ansible/issues/5351

The latest openshift-ansible that worked seems to be openshift-ansible-3.7.0-0.111.0-2-gf1d983527 (haven't tested, just looked at the ansible code).
Comment 12 zhaozhanqi 2017-09-17 22:20:21 EDT
verified on openshift v3.7.0-0.126.4 using latest openshift-ansible

this issue was fixed.
Comment 13 zhaozhanqi 2017-09-18 03:07:01 EDT
please ignore comment 12.

seems this bug still be reproduced when using openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm

-bash-4.2# docker exec 19b8b171d9c2 oc version
oc v3.7.0-0.126.4
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO
-bash-4.2# docker exec 19b8b171d9c2 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

here is jenkins log 
https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/21879/consoleFull
Comment 14 zhaozhanqi 2017-09-18 04:56:31 EDT
verified this bug when set the 'region=infra' during the installation.

the comment 13 is due to the node is not set the label 'region=infra' or 'region=primary' 

# cat /etc/tuned/recommend.conf 
[openshift-node]
/etc/origin/node/node-config.yaml=.*region=primary

[openshift-control-plane,master]
/etc/origin/master/master-config.yaml=.*

[openshift-control-plane,node]
/etc/origin/node/node-config.yaml=.*region=infra

Note You need to log in before you can comment on or make changes to this bug.