1462277 – Change the Elasticsearch setting "node.max_local_storage_nodes" to 1 to prevent sharing EBS volumes

Bug 1462277 - Change the Elasticsearch setting "node.max_local_storage_nodes" to 1 to prevent sharing EBS volumes

Summary: Change the Elasticsearch setting "node.max_local_storage_nodes" to 1 to preve...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.4.1
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.4.z
Assignee:	Jeff Cantrill
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:	1460564
Blocks:	1462281 1463046
TreeView+	depends on / blocked

Reported:	2017-06-16 15:13 UTC by Rich Megginson
Modified:	2019-11-04 16:48 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Elasticsearch default value for sharing storage between ES instances was wrong Consequence: The incorrect default value allowed an ES pod starting up (when another ES pod was shutting down, e.g. during dc redeployments) to create a new location on the PV for managing the storage volume, duplicating data, and in some instances, potentially causing data loss. Fix: All ES pods now run with "node.max_local_storage_nodes" set to 1. Result: The ES pods starting up/shutting down will no longer share the same storage and prevent the data duplication and/or data loss.
Clone Of:	1460564
Clones:	1462281 1463046 (view as bug list)
Environment:
Last Closed:	2017-07-11 10:47:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:1640	0	normal	SHIPPED_LIVE	OpenShift Container Platform 3.5 and 3.4 bug fix update	2017-07-11 14:47:16 UTC

Description Rich Megginson 2017-06-16 15:13:28 UTC

+++ This bug was initially created as a clone of Bug #1460564 +++

Change the setting for node.max_local_storage_nodes to 1 for all ES pods, as this would prevent us from seeing problems where two ES pods end up sharing the same EBS volume if one pod does not shut down properly.

For an example of this, see https://bugzilla.redhat.com/show_bug.cgi?id=1443350#c33

See discussion from https://discuss.elastic.co/t/multiple-folders-inside-nodes-folder/85358, and the documentation at https://www.elastic.co/guide/en/elasticsearch/reference/2.4/modules-node.html#max-local-storage-nodes.

Comment 1 Jeff Cantrill 2017-06-20 01:57:37 UTC

merged in https://github.com/openshift/openshift-ansible/pull/4466/

Comment 2 Jeff Cantrill 2017-06-20 02:03:29 UTC

Modifying this BZ to ref 3.4.1 as it clones the one for which comment 1 PR references the cloned BZ

Comment 3 Jeff Cantrill 2017-06-20 15:42:54 UTC

Upstream fix: https://github.com/openshift/origin-aggregated-logging/pull/49

Comment 4 Jeff Cantrill 2017-06-20 20:22:16 UTC

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:rhaos-3.4-rhel-7-docker-candidate-88845-20170620200020, 
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:3.4.1, 
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:latest, 
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:v3.4, 
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:v3.4.1.41, 
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-deployer:v3.4.1.41-2

Comment 6 Junqi Zhao 2017-06-23 03:06:25 UTC

max_local_storage_nodes is 1 now
# oc get configmap logging-elasticsearch -o yaml | grep -i max_local_storage_nodes
      max_local_storage_nodes: 1

Testing env:
# openshift version
openshift v3.4.1.42
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


Images from brew registry
# docker images | grep logging
logging-deployer           3.4.1               80ca9c90d261        35 hours ago        857.5 MB
logging-kibana             3.4.1               0c2759ddfcd9        35 hours ago        338.8 MB
logging-elasticsearch      3.4.1               2240ae237369        35 hours ago        399.6 MB
logging-fluentd            3.4.1               059b92a39419        35 hours ago        232.7 MB
logging-curator            3.4.1               46fd26ad9a8b        35 hours ago        244.5 MB
logging-auth-proxy         3.4.1               990787824baf        35 hours ago        215.3 MB

Comment 7 Praveen Varma 2017-06-28 04:05:37 UTC

@Jeff - We have a situation here with regards to the errata - https://errata.devel.redhat.com/advisory/29143 where the release date is tomorrow (29th June) and the customer is looking for this for quite some time. Customer also escalated this several times and Mustafa, Sudhir, Satish and a lot of others from the senior management is directly involved to get the issues taken care for the customer. Just received an update from Xiaoli Tan that if these bugs are fixed today, we could still have the timely release tomorrow.

Thanks,
Praveen
Escalation Manager

Comment 9 errata-xmlrpc 2017-07-11 10:47:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1640

Note You need to log in before you can comment on or make changes to this bug.