Bug 1561196 - Update the logging role to use facts from current deployment in lieu of role defaults for ES memory limits
Summary: Update the logging role to use facts from current deployment in lieu of role ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 3.9.z
Assignee: ewolinet
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-27 20:54 UTC by Peter Portante
Modified: 2018-05-17 06:44 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: In the absence of inventory values, reuse the values used for the current deployment to preserve sane/tuned values. Reason: In the case of Elasticsearch, when a customer had done tuning of the cluster but did not propagate those values into variables, upgrading logging would use role default values which may put the cluster in a bad state and lead to loss of log data. Result: We honor values in the order for EFK: inventory -> existing environment -> role defaults
Clone Of:
Environment:
Last Closed: 2018-05-17 06:43:34 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1566 None None None 2018-05-17 06:44:07 UTC

Description Peter Portante 2018-03-27 20:54:14 UTC
The Elasticsearch v2.x sizing guidelines [1] state that less than 8 GB ends up with too many small instances, with 64 GB being the sweet spot, but 32 GB and 16 GB being common sizes.

Let's update the default ES pod size to 16 GB (8 GB Java HEAP and 8 GB reserved for buffer cache) to stay in line with what is considered common.

[1] https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_memory

Comment 2 Peter Portante 2018-03-28 23:58:03 UTC
(In reply to Rich Megginson from comment #1)
> https://github.com/openshift/openshift-ansible/blob/master/roles/
> openshift_logging/defaults/main.yml#L102
> https://github.com/openshift/openshift-ansible/blob/master/roles/
> openshift_logging/defaults/main.yml#L139

Yes, thanks!

Comment 15 Junqi Zhao 2018-05-07 06:30:07 UTC
Deploy logging firstly and change fluentd nodeSelector to non-default value, logging-infra-test-fluentd=true

# oc get ds
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
logging-fluentd   2         2         2         2            2           logging-infra-test-fluentd=true   10m

Update logging with the same inventory, fluentd nodeSelector would use the default nodeSelector logging-infra-fluentd=true, not get the existing nodeSelector from environment

# oc get ds
NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                AGE
logging-fluentd   2         2         2         2            2           logging-infra-fluentd=true   15m


# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.9.28-1.git.0.4fc2ce4.el7.noarch
openshift-ansible-docs-3.9.28-1.git.0.4fc2ce4.el7.noarch
openshift-ansible-playbooks-3.9.28-1.git.0.4fc2ce4.el7.noarch
openshift-ansible-3.9.28-1.git.0.4fc2ce4.el7.noarch

Comment 17 Jeff Cantrill 2018-05-07 17:30:10 UTC
The reported BZ is specific to memory and cpu settings.  I am of the opinion that it should not block this test.  We should consider opening a separate BZ to resolve fluent related issues.

Comment 18 Junqi Zhao 2018-05-08 08:18:33 UTC
Tested, ES memory limits would get from existing deployment instead of using the defaults.

Polarion test case OCP-18917
# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.9.27-1.git.0.52e35b5.el7.noarch
openshift-ansible-docs-3.9.27-1.git.0.52e35b5.el7.noarch
openshift-ansible-playbooks-3.9.27-1.git.0.52e35b5.el7.noarch
openshift-ansible-3.9.27-1.git.0.52e35b5.el7.noarch

Comment 19 Junqi Zhao 2018-05-08 08:49:36 UTC
Issue in Comment 15 is reported in bug 1575901

Comment 22 errata-xmlrpc 2018-05-17 06:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1566


Note You need to log in before you can comment on or make changes to this bug.