NOTE: This is *not* a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1866019 / https://bugzilla.redhat.com/show_bug.cgi?id=1866963 / https://bugzilla.redhat.com/show_bug.cgi?id=1868675 which are caused by 500s from ES; or https://bugzilla.redhat.com/show_bug.cgi?id=1881709 where the ES query hangs. Description of problem: elasticsearch-delete-* pods fail with log: Traceback (most recent call last): File "<string>", line 4, in <module> KeyError: 'is_write_index' Version-Release number of selected component (if applicable): 4.5.0-202009041228.p0 (and others) How reproducible: Unknown -- but once a cluster is in this state, it semes to stick there for "a while". Additional info: Debugging the failed pods, they consistently show the following, indicating that the error is the result of a malformed JSON response from elasticsearch: sh-4.2$ bash -x /tmp/scripts/delete + set -euo pipefail +++ cat /var/run/secrets/kubernetes.io/serviceaccount/token ++ curl -s 'https://elasticsearch:9200/infra-*/_alias/infra-write' --cacert /etc/indexmanagement/keys/admin-ca '-HAuthorization: Bearer {redacted}' -HContent-Type:application/json + writeIndices='{"infra-000001":{"aliases":{"infra-write":{}}}}' ++ cat + CMD='import json,sys r=json.load(sys.stdin) alias="infra-write" indices = [index for index in r if r[index]['\''aliases'\''][alias]['\''is_write_index'\'']] if len(indices) > 0: print indices[0] ' ++ echo '{"infra-000001":{"aliases":{"infra-write":{}}}}' ++ python -c 'import json,sys r=json.load(sys.stdin) alias="infra-write" indices = [index for index in r if r[index]['\''aliases'\''][alias]['\''is_write_index'\'']] if len(indices) > 0: print indices[0] ' Traceback (most recent call last): File "<string>", line 4, in <module> KeyError: 'is_write_index' + writeIndex= NOTE: This is *not* a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1866019 / https://bugzilla.redhat.com/show_bug.cgi?id=1866963 / https://bugzilla.redhat.com/show_bug.cgi?id=1868675 which are caused by 500s from ES; or https://bugzilla.redhat.com/show_bug.cgi?id=1881709 where the ES query hangs.
(In reply to Eric Fried from comment #0) > NOTE: This is *not* a duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=1866019 / > https://bugzilla.redhat.com/show_bug.cgi?id=1866963 / > https://bugzilla.redhat.com/show_bug.cgi?id=1868675 which are caused by 500s > from ES; or https://bugzilla.redhat.com/show_bug.cgi?id=1881709 where the ES > query hangs. No but it is likely resolved when https://github.com/openshift/elasticsearch-operator/pull/488 merges. Closing as a duplicate for the time being until can be verified otherwise *** This bug has been marked as a duplicate of bug 1868675 ***
I ran the delete script from https://github.com/openshift/elasticsearch-operator/pull/488 in the failing environment and it still fails the same way. sh-4.2$ ./delete.latest Error trying to determine the 'write' index from '{u'app-000002': {u'aliases': {u'app-write': {}}}}': <type 'exceptions.KeyError'>sh-4.2$ echo $? 1
4.5 backport added to https://github.com/openshift/elasticsearch-operator/pull/488
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.1 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4198