1477518 – The neighbor cache should be also updated for atomic host env

Bug 1477518 - The neighbor cache should be also updated for atomic host env

Summary: The neighbor cache should be also updated for atomic host env

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.6.0
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Jiří Mencák
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-02 10:09 UTC by zhaozhanqi
Modified:	2017-11-28 22:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openshift-ansible-3.7.0-0.126.4
Doc Type:	Known Issue
Doc Text:	Never implemented in the installer for Atomic Host. Relevant docs merged: https://github.com/openshift/openshift-docs/pull/5115
Clone Of:
Environment:
Last Closed:	2017-11-28 22:06:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description zhaozhanqi 2017-08-02 10:09:55 UTC

Description of problem:

Install atomic openshift on atomic host. Check the arp cache of default on the node container:

# docker exec $node_container_id sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

The correct value should be:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

Version-Release number of selected component (if applicable):
v3.6.173.0.2

How reproducible:
always

Steps to Reproduce:
1. Install the OCP on Atomic Host
2. check the cache of on node container, like
 # docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
3.

Actual results:
docker exec ca9caa515b04 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

Expected results:
net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2017-08-02 12:40:42 UTC

@jmencak, how should we handle tuned profiles on atomic host? On RHEL they're installed via openshift RPMS which are largely copies of the upstream atomic profiles. Should we push this change into the upstream atomic profile or should we have the installer plop down a profile via ansible? If we go that route I'd prefer removing the profiles from the RPM packaging so that they only need to be updated in one place.

Or maybe the profile could be instantiated from within the node container?

Comment 2 Jiří Mencák 2017-08-07 10:02:41 UTC

@sdodson, pushing the change into upstream atomic profiles sounds like a hack (if not just settings duplicity) and have the installer set the profile sounds like a conflict with the "recommend" philosophy of the tuned package.  Also, this would result in a bug similar to BZ1459146 for Atomic Host too.  As for instantiating the profile from within the node container, I'm not sure what exactly do you mean.  Instantiating from a router pod, for example?

I'm all for updating tuned profiles in one place though.  How about making the profiles more aware of the OS and environment they run in (RHEL/atomic-host, virtual/bare_metal) to set the relevant parent profile and "recommend" the correct profile to use.  Something similar to https://github.com/openshift/openshift-ansible/pull/4566 and https://github.com/openshift/origin/pull/14859 with Atomic Host taking into account?

How would removing the profiles from RPM packaging help update the profiles in one place?  Where would you like to have them instead?

Comment 3 Scott Dodson 2017-08-07 12:41:10 UTC

The main question is how do we deliver the profiles to atomic host? We don't install RPMs there, we execute our ansible playbooks and when the installation is done there's a set of systemd units that run docker containers or system containers.

We could make the ansible playbooks responsible for copying the files to atomic host. They could get installed inside the node container or perhaps router container as you've suggested. Both of those are privileged containers.

Comment 6 zhaozhanqi 2017-09-11 07:05:23 UTC

When I installed atomic host env using the latest openshift-ansile. Found the bug still can be reproduced. 

BTW, Found this template https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/templates/tuned/openshift/tuned.conf is not used during install atomic host env. Could you help confirm this?

Comment 7 Jiří Mencák 2017-09-11 12:40:03 UTC

I can confirm that the current openshift-ansible no longer fixes this issue.

see https://github.com/openshift/openshift-ansible/issues/5351

The latest openshift-ansible that worked seems to be openshift-ansible-3.7.0-0.111.0-2-gf1d983527 (haven't tested, just looked at the ansible code).

Comment 12 zhaozhanqi 2017-09-18 02:20:21 UTC

verified on openshift v3.7.0-0.126.4 using latest openshift-ansible

this issue was fixed.

Comment 13 zhaozhanqi 2017-09-18 07:07:01 UTC

please ignore comment 12.

seems this bug still be reproduced when using openshift-ansible-3.7.0-0.126.4.git.0.3fc2b9b.el7.noarch.rpm

-bash-4.2# docker exec 19b8b171d9c2 oc version
oc v3.7.0-0.126.4
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO
-bash-4.2# docker exec 19b8b171d9c2 sysctl -a | grep net.ipv4.neigh.default.gc_thresh
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024

here is jenkins log 
https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/21879/consoleFull

Comment 14 zhaozhanqi 2017-09-18 08:56:31 UTC

verified this bug when set the 'region=infra' during the installation.

the comment 13 is due to the node is not set the label 'region=infra' or 'region=primary' 

# cat /etc/tuned/recommend.conf 
[openshift-node]
/etc/origin/node/node-config.yaml=.*region=primary

[openshift-control-plane,master]
/etc/origin/master/master-config.yaml=.*

[openshift-control-plane,node]
/etc/origin/node/node-config.yaml=.*region=infra

Comment 18 errata-xmlrpc 2017-11-28 22:06:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.