Bug 1371220 - Scaling down ElasticSearch creates new node directories
Summary: Scaling down ElasticSearch creates new node directories
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Luke Meyer
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-29 15:28 UTC by Eric Jones
Modified: 2019-12-16 06:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: The EFK deployer now configures terminationGracePeriodSeconds for Elasticsearch and Fluentd pods. Reason: We observed that sometimes Elasticsearch in particular would end up in a state where it did not remove its node.lock at shutdown. Elasticsearch shuts down properly, this should be deleted, but if it takes too long to shut down, OpenShift will hard-kill it after 30 seconds by default. If the node.lock is not removed from persistent storage, then when the instance is started again Elasticsearch treats the data directory as locked and starts with a fresh data directory, effectively losing all its data. Result: The explicit terminationGracePeriodSeconds gives both Fluentd and Elasticsearch more time to flush data and terminate properly so that this situation should occur less often. It cannot be completely eliminated; for example if ES runs into an out-of-memory situation, it may be hung indefinitely and still end up being killed and leaving the node.lock file. But this extended termination time should make normal shutdown scenarios safer.
Clone Of:
Environment:
Last Closed: 2016-10-27 15:43:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2085 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3.1.3 images bug fix update 2016-10-27 19:41:37 UTC

Description Eric Jones 2016-08-29 15:28:16 UTC
Description of problem:
Engineering indicated that they recently ran into a problem with the deployment where ES scaling down does not properly shutdown ES, so it starts creating new node directories, essentially stranding data, causing new relocation operations.

Anytime you scale down your ES nodes, or they fail, this can happen.

It leaves around a lock file in those node directories causes a pod scale-up to see the EBS volume as "in-use" by another node, and so it creates another copy of data.

Additional info:
The information below indicates that this cluster has had the issue occur twice previously:

sh-4.2$ ls /elasticsearch/persistent/logging-es/data/logging-es/nodes/
0  1  2

Comment 8 Eric Jones 2016-08-30 17:51:06 UTC
With this issue being a current problem, does this change the recommendation we provide in our documentation [0]? Keeping in mind that this recommendation has become the go to recommendation for most changes that need to be done with the EFK stack.

[0] https://docs.openshift.com/enterprise/3.2/install_config/upgrading/manual_upgrades.html#manual-upgrading-efk-logging-stack

Comment 12 ewolinet 2016-09-08 21:16:44 UTC
We can probably recommend increasing the terminationGracePeriodSeconds in the Elasticsearch pod spec (within the DC).  The default is 30 seconds, and if ES isn't able to finish its tasks during this time, it will then be issued a SIGKILL.

If ES isn't able to release its locks it will create these other directories.

Comment 13 Peter Portante 2016-09-12 11:13:30 UTC
I think Eric W. has covered all the right steps to take here.  Any word on how this worked out in the field?

Comment 17 Luke Meyer 2016-10-12 14:26:22 UTC
Did the change in https://github.com/openshift/origin-aggregated-logging/pull/227 get into a release yet? If so, we should probably attach this bug to an errata or close it. I don't see any more helpful fix for this coming along.

Comment 18 ewolinet 2016-10-12 14:40:47 UTC
I didn't see it in there, but I'll sync it over now for the 3.3 and 3.4 deployer images

Comment 21 Xia Zhao 2016-10-19 11:40:47 UTC
Verified with this image, issue has been fixed:

registry.ops.openshift.com/openshift3/logging-deployer        3.3.1               1e85b37518ba        14 hours ago        761.6 MB

--Scaled down es cluster for 3 times, only 1 node created in pv:

$ ls /elasticsearch/persistent/logging-es/data/logging-es/nodes/
0

# openshift version
openshift v3.3.1.3
kubernetes v1.3.0+52492b4
etcd 2.3.0+git

Comment 22 Luke Meyer 2016-10-21 20:46:45 UTC
Hey Eric Jones, do we have a kbase on what it looks like when node.lock is left behind in ES storage and what to do about it?

Comment 24 errata-xmlrpc 2016-10-27 15:43:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2085


Note You need to log in before you can comment on or make changes to this bug.