Description of problem: Elasticsearch rollover pods failed with resource_already_exists_exception Version-Release number of selected component (if applicable): OpenShift Container Platform 4.5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Elasticsearch rollover pods failures are constant. $ parallel oc -n openshift-logging logs --prefix {} ::: elasticsearch-delete-app-1603918800-jn8xk elasticsearch-delete-audit-1603918800-sdhns elasticsearch-delete-infra-1603918800-bwf9h elasticsearch-rollover-app-1603964700-zkmmg elastics earch-rollover-audit-1603964700-nx8pq elasticsearch-rollover-infra-1603964700-jghrf [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] Traceback (most recent call last): [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] File "<string>", line 2, in <module> [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] File "/usr/lib64/python2.7/json/__init__.py", line 290, in load [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] **kw) [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] return _default_decoder.decode(s) [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] obj, end = self.raw_decode(s, idx=_w(s, 0).end()) [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] raise ValueError("No JSON object could be decoded") [pod/elasticsearch-delete-app-1603918800-jn8xk/indexmanagement] ValueError: No JSON object could be decoded [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] Traceback (most recent call last): [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] File "<string>", line 2, in <module> [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] File "/usr/lib64/python2.7/json/__init__.py", line 290, in load [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] **kw) [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] return _default_decoder.decode(s) [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] obj, end = self.raw_decode(s, idx=_w(s, 0).end()) [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] raise ValueError("No JSON object could be decoded") [pod/elasticsearch-delete-audit-1603918800-sdhns/indexmanagement] ValueError: No JSON object could be decoded [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] Traceback (most recent call last): [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] File "<string>", line 2, in <module> [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] File "/usr/lib64/python2.7/json/__init__.py", line 290, in load [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] **kw) [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] return _default_decoder.decode(s) [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] obj, end = self.raw_decode(s, idx=_w(s, 0).end()) [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] raise ValueError("No JSON object could be decoded") [pod/elasticsearch-delete-infra-1603918800-bwf9h/indexmanagement] ValueError: No JSON object could be decoded [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] { [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "error" : { [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "root_cause" : [ [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] { [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "type" : "resource_already_exists_exception", [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "reason" : "index [app-000004/36SzdIGFS0aQMqN3dIOxxQ] already exists", [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "index_uuid" : "36SzdIGFS0aQMqN3dIOxxQ", [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "index" : "app-000004" [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] } [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] ], [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "type" : "resource_already_exists_exception", [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "reason" : "index [app-000004/36SzdIGFS0aQMqN3dIOxxQ] already exists", [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "index_uuid" : "36SzdIGFS0aQMqN3dIOxxQ", [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "index" : "app-000004" [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] }, [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] "status" : 400 [pod/elasticsearch-rollover-app-1603964700-zkmmg/indexmanagement] } [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] { [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "error" : { [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "root_cause" : [ [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] { [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "type" : "resource_already_exists_exception", [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "reason" : "index [audit-000002/jt_t-wDtQ2-X8h_qO_1cDw] already exists", [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "index_uuid" : "jt_t-wDtQ2-X8h_qO_1cDw", [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "index" : "audit-000002" [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] } [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] ], [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "type" : "resource_already_exists_exception", [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "reason" : "index [audit-000002/jt_t-wDtQ2-X8h_qO_1cDw] already exists", [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "index_uuid" : "jt_t-wDtQ2-X8h_qO_1cDw", [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "index" : "audit-000002" [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] }, [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] "status" : 400 [pod/elasticsearch-rollover-audit-1603964700-nx8pq/indexmanagement] } [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] { [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "error" : { [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "root_cause" : [ [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] { [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "type" : "resource_already_exists_exception", [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "reason" : "index [infra-000004/VFr9HBz9QD6fWWq69HDlNA] already exists", [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "index_uuid" : "VFr9HBz9QD6fWWq69HDlNA", [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "index" : "infra-000004" [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] } [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] ], [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "type" : "resource_already_exists_exception", [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "reason" : "index [infra-000004/VFr9HBz9QD6fWWq69HDlNA] already exists", [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "index_uuid" : "VFr9HBz9QD6fWWq69HDlNA", [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "index" : "infra-000004" [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] }, [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] "status" : 400 [pod/elasticsearch-rollover-infra-1603964700-jghrf/indexmanagement] } The conditions causing these failures are intermittent, as often when a job fails a subsequent job completes successfully: $ oc -n openshift-logging get pod | grep elasticsearch- elasticsearch-cdm-pve9r608-1-76b7c8b6f7-s8x7x 2/2 Running 1 14h elasticsearch-cdm-pve9r608-2-67d7787cb4-8wzxn 2/2 Running 2 14h elasticsearch-cdm-pve9r608-3-669fb8b647-9fkmt 0/2 ContainerCreating 0 12h elasticsearch-delete-app-1603918800-jn8xk 0/1 Error 0 13h elasticsearch-delete-app-1603965600-pvf2q 0/1 Completed 0 105s elasticsearch-delete-audit-1603918800-sdhns 0/1 Error 0 13h elasticsearch-delete-audit-1603965600-x77tg 0/1 Completed 0 105s elasticsearch-delete-infra-1603918800-bwf9h 0/1 Error 0 13h elasticsearch-delete-infra-1603965600-r9lqq 0/1 Completed 0 105s elasticsearch-rollover-app-1603965600-m5w7q 0/1 Error 0 105s elasticsearch-rollover-audit-1603965600-85wbr 0/1 Error 0 105s elasticsearch-rollover-infra-1603965600-wpqvx 0/1 Error 0 104s
The one part of this issue in the delete script is solved by the PR in: https://bugzilla.redhat.com/show_bug.cgi?id=1899905 I am continuing investigation for the rollover script.
@naygupta After carefully considering the must-gather contents and investigating a little bit more in the inertia of the Elasticsearch Rollover API, I can conclude the following: - The master node of this cluster looks to be under pressure, especially the master node. In addition GC takes a lot of time according to the logs. > ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name > 10.128.4.13 32 99 30 19.01 18.50 18.00 mdi * elasticsearch-cdm-pve9r608-2 > 10.130.2.5 47 73 14 17.91 15.25 14.62 mdi - elasticsearch-cdm-pve9r608-1 - The index rollover via an alias is a multi-step task and asynchronous by nature. This means when you encounter that the rollover failed but a new index was created, it looks like the alias update failed because your cluster was under pressure. This is a bug being reported to Elastic for a long time now [1]. However, it implies some manual intervention for now if you hit this case: For example taking the infra logs rollover job where the alias is `infra-write` and points currently to `infra-000003` and fails to rollover to `infra-000004` because of cluster pressure, you should: - First remove the empty index `infra-000004` - Adapt the ES cluster CPU/Memory resources [1] https://github.com/elastic/elasticsearch/issues/30340
Tested with elasticsearch-operator.5.0.0-18, unable to reproduce this issue. Move to verified, if you hit this issue, please feel to reopen it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652