1929688 – Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get" - OCP 4.6.16

Bug 1929688 - Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get" - OCP 4.6.16

Summary: Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception in...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.z
Assignee:	ewolinet
QA Contact:	Qiaoling Tang
Docs Contact:
URL:
Whiteboard:	logging-exploration
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-17 13:06 UTC by Victor Hernando
Modified:	2024-10-01 17:30 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	* Previously, while under load, Elasticsearch responded to some requests with an HTTP 500 error, even though there was nothing wrong with the cluster. Retrying the request was successful. This release fixes the issue by updating the cron jobs to be more resilient when encountering temporary HTTP 500 errors. Now, they will retry a request multiple times first before failing. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1929688[BZ#1929688])
Clone Of:
Environment:
Last Closed:	2021-03-30 16:54:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 678	None	open	[release-4.6] Bug 1928772: Adding safeguards to cronjob to be more resilient to es 500 responses	2021-03-15 19:44:20 UTC
Red Hat Knowledge Base (Solution)	5410091	None	None	None	2021-03-12 10:03:28 UTC
Red Hat Product Errata	RHBA-2021:0954	None	None	None	2021-03-30 16:55:09 UTC

Comment 7 David Hernández Fernández 2021-02-24 09:47:35 UTC

Same here, let us know if you need anything else, this is in OCP 4.6.16. and latest logging csv.
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}

Comment 16 Qiaoling Tang 2021-03-25 08:46:18 UTC

Testing with elasticsearch-operator.4.6.0-202103202154.p0, I set the index management cronjobs to run in every 3 minutes and the ES cluster is running for about 29 hours, no job fails.

$ oc get pod
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-6f66778f94-7zpmh       1/1     Running            0          29h
elasticsearch-cdm-kbvuvj7o-1-5989bcf7c4-vkxrc   2/2     Running            0          29h
elasticsearch-cdm-kbvuvj7o-2-57468594c7-5n8kf   2/2     Running            0          29h
elasticsearch-cdm-kbvuvj7o-3-5df4bc888d-5dx8h   2/2     Running            0          29h
elasticsearch-im-app-1616659740-dx989           0/1     Completed          0          79s
elasticsearch-im-audit-1616659740-p26qw         0/1     Completed          0          79s
elasticsearch-im-infra-1616659740-swdt7         0/1     Completed          0          79s
fluentd-bsjzw                                   1/1     Running            0          29h
fluentd-fsl9g                                   1/1     Running            0          29h
fluentd-pjqzd                                   1/1     Running            0          29h
fluentd-rdfkt                                   1/1     Running            0          29h
fluentd-tv9hh                                   1/1     Running            0          29h
fluentd-v6w9f                                   1/1     Running            0          29h
kibana-8685fbf674-c9fct                         2/2     Running            0          29h

Move this bz to verified.

Comment 18 errata-xmlrpc 2021-03-30 16:54:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.23 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0954

Note You need to log in before you can comment on or make changes to this bug.