Bug 1929688 - Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get" - OCP 4.6.16
Summary: Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.6.z
Assignee: ewolinet
QA Contact: Qiaoling Tang
URL:
Whiteboard: logging-exploration
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-17 13:06 UTC by Victor Hernando
Modified: 2022-10-14 03:51 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, while under load, Elasticsearch responded to some requests with an HTTP 500 error, even though there was nothing wrong with the cluster. Retrying the request was successful. This release fixes the issue by updating the cron jobs to be more resilient when encountering temporary HTTP 500 errors. Now, they will retry a request multiple times first before failing. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1929688[*BZ#1929688*])
Clone Of:
Environment:
Last Closed: 2021-03-30 16:54:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 678 0 None open [release-4.6] Bug 1928772: Adding safeguards to cronjob to be more resilient to es 500 responses 2021-03-15 19:44:20 UTC
Red Hat Knowledge Base (Solution) 5410091 0 None None None 2021-03-12 10:03:28 UTC
Red Hat Product Errata RHBA-2021:0954 0 None None None 2021-03-30 16:55:09 UTC

Comment 7 David Hernández Fernández 2021-02-24 09:47:35 UTC
Same here, let us know if you need anything else, this is in OCP 4.6.16. and latest logging csv.
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}

Comment 16 Qiaoling Tang 2021-03-25 08:46:18 UTC
Testing with elasticsearch-operator.4.6.0-202103202154.p0, I set the index management cronjobs to run in every 3 minutes and the ES cluster is running for about 29 hours, no job fails.

$ oc get pod
NAME                                            READY   STATUS             RESTARTS   AGE
cluster-logging-operator-6f66778f94-7zpmh       1/1     Running            0          29h
elasticsearch-cdm-kbvuvj7o-1-5989bcf7c4-vkxrc   2/2     Running            0          29h
elasticsearch-cdm-kbvuvj7o-2-57468594c7-5n8kf   2/2     Running            0          29h
elasticsearch-cdm-kbvuvj7o-3-5df4bc888d-5dx8h   2/2     Running            0          29h
elasticsearch-im-app-1616659740-dx989           0/1     Completed          0          79s
elasticsearch-im-audit-1616659740-p26qw         0/1     Completed          0          79s
elasticsearch-im-infra-1616659740-swdt7         0/1     Completed          0          79s
fluentd-bsjzw                                   1/1     Running            0          29h
fluentd-fsl9g                                   1/1     Running            0          29h
fluentd-pjqzd                                   1/1     Running            0          29h
fluentd-rdfkt                                   1/1     Running            0          29h
fluentd-tv9hh                                   1/1     Running            0          29h
fluentd-v6w9f                                   1/1     Running            0          29h
kibana-8685fbf674-c9fct                         2/2     Running            0          29h

Move this bz to verified.

Comment 18 errata-xmlrpc 2021-03-30 16:54:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.23 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0954


Note You need to log in before you can comment on or make changes to this bug.