1510697 – The readiness probe defined for the Elasticsearch pods in aggregated logging prevents cluster recovery

Bug 1510697 - The readiness probe defined for the Elasticsearch pods in aggregated logging prevents cluster recovery

Summary: The readiness probe defined for the Elasticsearch pods in aggregated logging ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.6.1
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.7.z
Assignee:	Jan Wozniak
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1523807 1523808
TreeView+	depends on / blocked

Reported:	2017-11-08 02:34 UTC by Peter Portante
Modified:	2018-06-20 15:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: ES Clusters that take a long time recovering data do not reach YELLOW state fast enough Consequence: Openshift cluster restarts the pod because the readiness probe fails which starts the ES node recovery again Fix: Check only for the ES cluster to be listening on the desired port Result: The Openshift Cluster does not terminate the ES node early which allows it to complete its recovery. The cluster may be in RED state at this time but is able to accept queries and writes.
Clone Of:
Clones:	1523807 1523808 (view as bug list)
Environment:
Last Closed:	2018-01-23 17:58:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift origin-aggregated-logging pull 812	None	closed	Bug 1510697 - Simplify ES readiness probe	2020-07-28 05:20:04 UTC
Github	openshift origin-aggregated-logging pull 833	None	closed	1510697 on 37	2020-07-28 05:20:04 UTC
Red Hat Product Errata	RHBA-2018:0113	normal	SHIPPED_LIVE	OpenShift Container Platform 3.7 and 3.6 bug fix and enhancement update	2018-01-23 22:55:59 UTC

Description Peter Portante 2017-11-08 02:34:09 UTC

The current readiness probe will not declare an ES pod ready until the cluster reaches a "yellow" state, and the ES pod's SearchGuard index is properly initialized.

The ES pod is also given a 10 minute timeout for the recreate strategy deployment.

If an ES pod does not become ready within that 10 minutes, the ES pod is killed.

This behavior is bad as is prevents long-running recoveries from taking place.

For example, if one of the ES pod's PVs were lost for some reason, or if a new ES pod is added to a cluster with no room left on its PVs, it can much longer for 10 minutes for Elasticsearch to relocate shards to the new PVs.

If the recreate strategy kills the pod before this can be accomplish, the Elasticsearch instance will never come up.

Comment 1 Rich Megginson 2017-11-08 02:48:07 UTC

We need a much longer default timeout?  What about 24 hours?

Comment 2 Peter Portante 2017-11-08 03:11:00 UTC

I believe the readiness probe is not being applied correctly.

What does it mean to be "not-ready"? The intention is an ES pod does not participate in the Elasticsearch "service" until that the ES pod is ready. But that has nothing to do with the formation of an Elasticsearch cluster and the internal housing keeping that has to be performed to get ready. So the readiness check ends up causing the deployment to be killed when it was actually deployed successfully and doing what it was supposed to do.

There will be times when 24 hours might not be enough time depending on the size of the data sets in play during a recovery.

Instead, I think we need to leverage the notion of client, master, and data nodes here as Anton has been suggesting now for a while applying appropriate readiness probes for each.

Master nodes would be one DC with a replica count, with each "readiness" probe just being that the Java process is up and running and responding to HTTPS requests. Master nodes would use a Java HEAP matching the pod memory limits since it does not handle on-disk data.

Data nodes would be one DC per PV as they are today, each with a simple readiness probe that it is responding to HTTPS requests.

Client nodes would be one DC with a replica count, Java HEAP matching the pod memory limits since they also don't handle data, with each "readiness" probe verifying that the client has joined the cluster properly. I don't think readiness probes should reflect the state of the cluster, as ES clients get that information in response to API requests.

Comment 3 Jan Wozniak 2017-12-05 15:51:37 UTC

Peter, which part of the readiness probe checks for the ES "yellow" state? Or is it meant transitively that ES run script tries to insert index_templates and SG files and they do not get inserted into a red state index, therefore, readiness probe is unable to fetch them?

Comment 4 Jeff Cantrill 2017-12-05 16:46:33 UTC

Backport to 3.7->3.6

Comment 5 openshift-github-bot 2017-12-08 17:37:53 UTC

Commits pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/a316868a2b16b4813642bfec0ebd5efb2ab38665
Bug 1510697 - Simplify ES readiness probe

https://bugzilla.redhat.com/show_bug.cgi?id=1510697

https://github.com/openshift/origin-aggregated-logging/commit/4be0c5293b38337281bf681cc7823a035243f7d2
Merge pull request #812 from wozniakjan/bz1510697/simplify_es_rp

Automatic merge from submit-queue.

Bug 1510697 - Simplify ES readiness probe

Comment 7 Anping Li 2017-12-19 02:24:42 UTC

The fix is in openshift3/logging-elasticsearch/images/v3.7.14-5. move bug to verified.

Comment 10 errata-xmlrpc 2018-01-23 17:58:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113

Comment 11 openshift-github-bot 2018-06-20 15:06:33 UTC

Commit pushed to master at https://github.com/openshift/origin-aggregated-logging

https://github.com/openshift/origin-aggregated-logging/commit/168a33b9cbc8e04dc9fbd828d40f3308a94b28af
Bug 1510697 - Simplify ES readiness probe

https://bugzilla.redhat.com/show_bug.cgi?id=1510697
(cherry picked from commit a316868a2b16b4813642bfec0ebd5efb2ab38665)

Note You need to log in before you can comment on or make changes to this bug.