* Previously, while under load, Elasticsearch responded to some requests with an HTTP 500 error, even though there was nothing wrong with the cluster. Retrying the request was successful. This release fixes the issue by updating the cron jobs to be more resilient when encountering temporary HTTP 500 errors. Now, they will retry a request multiple times first before failing.
(link:https://bugzilla.redhat.com/show_bug.cgi?id=1929688[*BZ#1929688*])
Comment 7David Hernández Fernández
2021-02-24 09:47:35 UTC
Same here, let us know if you need anything else, this is in OCP 4.6.16. and latest logging csv.
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Testing with elasticsearch-operator.4.6.0-202103202154.p0, I set the index management cronjobs to run in every 3 minutes and the ES cluster is running for about 29 hours, no job fails.
$ oc get pod
NAME READY STATUS RESTARTS AGE
cluster-logging-operator-6f66778f94-7zpmh 1/1 Running 0 29h
elasticsearch-cdm-kbvuvj7o-1-5989bcf7c4-vkxrc 2/2 Running 0 29h
elasticsearch-cdm-kbvuvj7o-2-57468594c7-5n8kf 2/2 Running 0 29h
elasticsearch-cdm-kbvuvj7o-3-5df4bc888d-5dx8h 2/2 Running 0 29h
elasticsearch-im-app-1616659740-dx989 0/1 Completed 0 79s
elasticsearch-im-audit-1616659740-p26qw 0/1 Completed 0 79s
elasticsearch-im-infra-1616659740-swdt7 0/1 Completed 0 79s
fluentd-bsjzw 1/1 Running 0 29h
fluentd-fsl9g 1/1 Running 0 29h
fluentd-pjqzd 1/1 Running 0 29h
fluentd-rdfkt 1/1 Running 0 29h
fluentd-tv9hh 1/1 Running 0 29h
fluentd-v6w9f 1/1 Running 0 29h
kibana-8685fbf674-c9fct 2/2 Running 0 29h
Move this bz to verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.6.23 extras update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:0954
Same here, let us know if you need anything else, this is in OCP 4.6.16. and latest logging csv. {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}