Bug 1561196

Summary:	Update the logging role to use facts from current deployment in lieu of role defaults for ES memory limits
Product:	OpenShift Container Platform	Reporter:	Peter Portante <pportant>
Component:	Logging	Assignee:	ewolinet
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.9.0	CC:	anli, aos-bugs, ewolinet, jcantril, juzhao, pportant, rmeggins, sreber, tkatarki
Target Milestone:	---
Target Release:	3.9.z
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Feature: In the absence of inventory values, reuse the values used for the current deployment to preserve sane/tuned values. Reason: In the case of Elasticsearch, when a customer had done tuning of the cluster but did not propagate those values into variables, upgrading logging would use role default values which may put the cluster in a bad state and lead to loss of log data. Result: We honor values in the order for EFK: inventory -> existing environment -> role defaults	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-17 06:43:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Peter Portante 2018-03-27 20:54:14 UTC

The Elasticsearch v2.x sizing guidelines [1] state that less than 8 GB ends up with too many small instances, with 64 GB being the sweet spot, but 32 GB and 16 GB being common sizes.

Let's update the default ES pod size to 16 GB (8 GB Java HEAP and 8 GB reserved for buffer cache) to stay in line with what is considered common.

[1] https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_memory

Comment 1 Rich Megginson 2018-03-27 21:18:38 UTC

That would be here: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging/defaults/main.yml#L102
and here: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging/defaults/main.yml#L139
?

Comment 2 Peter Portante 2018-03-28 23:58:03 UTC

(In reply to Rich Megginson from comment #1)
> https://github.com/openshift/openshift-ansible/blob/master/roles/
> openshift_logging/defaults/main.yml#L102
> https://github.com/openshift/openshift-ansible/blob/master/roles/
> openshift_logging/defaults/main.yml#L139

Yes, thanks!

Comment 10 ewolinet 2018-04-16 21:42:38 UTC

https://github.com/openshift/openshift-ansible/pull/7985

Comment 15 Junqi Zhao 2018-05-07 06:30:07 UTC

Deploy logging firstly and change fluentd nodeSelector to non-default value, logging-infra-test-fluentd=true

# oc get ds
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
logging-fluentd   2         2         2         2            2           logging-infra-test-fluentd=true   10m

Update logging with the same inventory, fluentd nodeSelector would use the default nodeSelector logging-infra-fluentd=true, not get the existing nodeSelector from environment

# oc get ds
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                AGE
logging-fluentd   2         2         2         2            2           logging-infra-fluentd=true   15m


# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.9.28-1.git.0.4fc2ce4.el7.noarch
openshift-ansible-docs-3.9.28-1.git.0.4fc2ce4.el7.noarch
openshift-ansible-playbooks-3.9.28-1.git.0.4fc2ce4.el7.noarch
openshift-ansible-3.9.28-1.git.0.4fc2ce4.el7.noarch

Comment 17 Jeff Cantrill 2018-05-07 17:30:10 UTC

The reported BZ is specific to memory and cpu settings.  I am of the opinion that it should not block this test.  We should consider opening a separate BZ to resolve fluent related issues.

Comment 18 Junqi Zhao 2018-05-08 08:18:33 UTC

Tested, ES memory limits would get from existing deployment instead of using the defaults.

Polarion test case OCP-18917
# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.9.27-1.git.0.52e35b5.el7.noarch
openshift-ansible-docs-3.9.27-1.git.0.52e35b5.el7.noarch
openshift-ansible-playbooks-3.9.27-1.git.0.52e35b5.el7.noarch
openshift-ansible-3.9.27-1.git.0.52e35b5.el7.noarch

Comment 19 Junqi Zhao 2018-05-08 08:49:36 UTC

Issue in Comment 15 is reported in bug 1575901

Comment 22 errata-xmlrpc 2018-05-17 06:43:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1566