Bug 1985826

Summary: The elasticsearch-im-xxx job failed when trying to start index management process for a non-existent(empty-named) index
Product: OpenShift Container Platform Reporter: yhe
Component: LoggingAssignee: Gerard Vanloo <gvanloo>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6.zCC: ableisch, achakrat, afurbach, aharchin, aos-bugs, asheth, ausov, bmehra, ChetRHosey, chrzhang, cruhm, ctakahas, dahernan, dgautam, dkulkarn, gvanloo, ivan.wong, jaliang, jcall, jcantril, kiyyappa, mchebbi, mmarkand, morgan.peterman, mzali, ncarmich, ocasalsa, periklis, pkhaire, pwaghmod, sascha.gruen, yener.yumlu
Target Milestone: ---Keywords: Reopened
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-exploration
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-27 08:26:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yhe 2021-07-26 01:52:31 UTC
Description of problem:
The elasticsearch-im-xxx job failed when trying to start the index management process for a non-existent(empty-named) index.

The following logs can be seen in the logs of the elasticsearch-im-xxx-xxxxx pods that are in Error status.

2021-07-25T05:15:12.824232244Z Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-07-25T05:15:12.824232244Z 
2021-07-25T05:15:12.839052478Z Received an empty response from elasticsearch -- server may not be ready

And also:

2021-07-25T20:45:15.629014714Z Index management rollover process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-07-25T20:45:15.629014714Z 
2021-07-25T20:45:15.640236263Z Received an empty response from elasticsearch -- server may not be ready

while the normal logs should be like:

2021-07-25T05:15:12.518377608Z Index management delete process starting for infra
2021-07-25T05:15:12.518377608Z 
2021-07-25T05:15:12.714427228Z deleting indices: infra-003321
2021-07-25T05:15:12.817929576Z Done!

2021-07-25T05:15:12.892345720Z Index management rollover process starting for infra
2021-07-25T05:15:12.892345720Z 
2021-07-25T05:15:12.978155500Z Current write index for infra-write: infra-003341
2021-07-25T05:15:13.029990544Z Checking results from _rollover call
2021-07-25T05:15:13.070198489Z Next write index for infra-write: infra-003341
2021-07-25T05:15:13.070198489Z Checking if infra-003341 exists
2021-07-25T05:15:13.116276007Z Checking if infra-003341 is the write index for infra-write
2021-07-25T05:15:13.194983109Z Done!

Will upload other related information later.

Version-Release number of selected component (if applicable):

How reproducible:
Sometimes

Steps to Reproduce:
1. Check the elasticsearch-im-xxx job's status
2. Find the failed elasticsearch-im-xxx job
3. Check the logs of the elasticsearch-im-xxx-xxxxx pod

Actual results:
The above error can be seen in the logs

Expected results:
The above error doesn't happen and the job was completed correctly

Additional info:

Comment 3 bmehra 2021-08-18 10:56:47 UTC
Facing similar issue in openshift 4.7 as mentioned in the description.

Below are the logs of failed jobs of Elasticsearch 

$  omg logs elasticsearch-im-app-1628712900-8nsjt

++++
2021-08-11T20:15:19.515097868Z Index management delete process starting for {"code":500,"message":"Internal
2021-08-11T20:15:19.515097868Z 
2021-08-11T20:15:19.524015986Z Received an empty response from elasticsearch -- server may not be ready
++++

$ omg logs elasticsearch-im-infra-1628748000-wv7gf

++++
2021-08-12T06:00:12.478110967Z Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-08-12T06:00:12.478178196Z 
2021-08-12T06:00:12.488256306Z Received an empty response from elasticsearch -- server may not be ready
++++


If you need any more information, please let me know.

Regards,
Bobby Mehra 
Red Hat.

Comment 4 bmehra 2021-08-18 10:57:34 UTC
Facing similar issue in openshift 4.7 as mentioned in the description.

Below are the logs of failed jobs of Elasticsearch 

$  omg logs elasticsearch-im-app-1628712900-8nsjt

++++
2021-08-11T20:15:19.515097868Z Index management delete process starting for {"code":500,"message":"Internal
2021-08-11T20:15:19.515097868Z 
2021-08-11T20:15:19.524015986Z Received an empty response from elasticsearch -- server may not be ready
++++

$ omg logs elasticsearch-im-infra-1628748000-wv7gf

++++
2021-08-12T06:00:12.478110967Z Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-08-12T06:00:12.478178196Z 
2021-08-12T06:00:12.488256306Z Received an empty response from elasticsearch -- server may not be ready
++++


If you need any more information, please let me know.

Regards,
Bobby Mehra 
Red Hat.

Comment 5 Jace Liang 2021-08-19 07:48:19 UTC
Facing similar issue in openshift 4.7.9, using ipi on aws.
Cluster Logging 5.1.0-75 
OpenShift Elasticsearch Operator 5.1.0-96

Log of elasticsearch-im-infra-1629355500-2flpr

========================
Index management delete process starting for infra

No indices to delete
========================
Index management rollover process starting for infra

Current write index for infra-write: infra-000493
Checking results from _rollover call
Next write index for infra-write: infra-000494
Checking if infra-000494 exists
Checking if infra-000494 is the write index for infra-write
Done!
========================
Index management rollover process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query

Received an empty response from elasticsearch -- server may not be ready

Comment 6 aharchin 2021-08-23 10:50:44 UTC

Facing a similar issue in OpenShift 4.8.4
Cluster Logging 5.1.0-75 
OpenShift Elasticsearch Operator 5.1.0-96

**app**

~~~
2021-08-18T07:45:02.880651084Z ========================
2021-08-18T07:45:02.880651084Z Index management delete process starting for app
2021-08-18T07:45:02.880651084Z 
2021-08-18T07:45:03.024691766Z No indices to delete
2021-08-18T07:45:03.079613499Z ========================
2021-08-18T07:45:03.079613499Z Index management rollover process starting for app
2021-08-18T07:45:03.079636900Z 
2021-08-18T07:45:03.157619939Z Current write index for app-write: app-001391
2021-08-18T07:45:03.174729379Z Checking results from _rollover call
2021-08-18T07:45:03.174785694Z Calculating next write index based on current write index...
2021-08-18T07:45:03.185094156Z Next write index for app-write: app-001392
2021-08-18T07:45:03.185094156Z Checking if app-001392 exists
2021-08-18T07:45:03.446187155Z {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"app-001392","index_uuid":"_na_","index":"app-001392"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"app-001392","index_uuid":"_na_","index":"app-001392"},"status":404}
~~~

------------------------------------------------
**audit**

~~~
2021-08-18T11:15:03.170981328Z ========================
2021-08-18T11:15:03.170981328Z Index management delete process starting for Received
2021-08-18T11:15:03.170981328Z 
2021-08-18T11:15:03.273149997Z Error while attemping to determine the active write alias: {'error': 'alias [Received-write] missing', 'status': 404}
2021-08-18T11:15:03.356975699Z ========================
2021-08-18T11:15:03.356975699Z Index management rollover process starting for audit
2021-08-18T11:15:03.356975699Z 
2021-08-18T11:15:03.428212646Z Current write index for audit-write: audit-001393
2021-08-18T11:15:04.067360313Z Checking results from _rollover call
2021-08-18T11:15:04.098107464Z Next write index for audit-write: audit-001394
2021-08-18T11:15:04.098107464Z Checking if audit-001394 exists
2021-08-18T11:15:04.136677540Z Checking if audit-001394 is the write index for audit-write
2021-08-18T11:15:04.203876863Z Done!
~~~

-------------------------------------------------
**infra**

~~~
2021-08-18T10:30:03.390222987Z ========================
2021-08-18T10:30:03.390222987Z Index management delete process starting for infra
2021-08-18T10:30:03.390222987Z 
2021-08-18T10:30:03.408917414Z Received an empty response from elasticsearch -- server may not be ready
~~~

Pods status:

~~~
$ oc -n openshift-logging get po  --field-selector=status.phase==Failed
NAME                                    READY   STATUS   RESTARTS   AGE
elasticsearch-im-app-27155985-xxxx     0/1     Error    0          14m
elasticsearch-im-audit-27155820-xxxx   0/1     Error    0          179m
elasticsearch-im-infra-27155310-xxxx   0/1     Error    0          11h
~~~

Comment 9 Aleksey Usov 2021-09-07 21:37:41 UTC
Seeing the same issue in 4.7.21:

# oc logs elasticsearch-im-audit-1631027700-8gqhd
========================
Index management delete process starting for audit

No indices to delete
========================
Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query

Received an empty response from elasticsearch -- server may not be ready
========================
Index management rollover process starting for audit

Current write index for audit-write: audit-000011
Checking results from _rollover call
Next write index for audit-write: audit-000011
Checking if audit-000011 exists
Checking if audit-000011 is the write index for audit-write
Done!

# oc get csv
NAME                               DISPLAY                            VERSION    REPLACES                           PHASE
cluster-logging.5.2.0-57           Red Hat OpenShift Logging          5.2.0-57   cluster-logging.5.1.0-75           Succeeded
elasticsearch-operator.5.2.0-58    OpenShift Elasticsearch Operator   5.2.0-58   elasticsearch-operator.5.1.0-96    Succeeded

Comment 10 yener 2021-09-14 07:29:37 UTC
Facing the similar issue in 4.7.24.
Cluster Logging 5.1.0-75 
OpenShift Elasticsearch Operator 5.1.0-96

$ oc logs elasticsearch-im-audit-1631232000-8mx4p
========================
Index management delete process starting for audit

No indices to delete
========================
Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query

Received an empty response from elasticsearch -- server may not be ready
========================
Index management rollover process starting for audit

Current write index for audit-write: audit-000590
Checking results from _rollover call
Next write index for audit-write: audit-000590
Checking if audit-000590 exists
Checking if audit-000590 is the write index for audit-write
Done!

Regards,
Yener Yumlu

Comment 16 David Hernández Fernández 2021-10-11 13:54:20 UTC
Still facing the issue and able to be reproduced in latest Cluster Logging version and respective latest OpenShift Elasticsearch Operator.

Comment 24 Sashank Agarwal 2021-11-18 13:53:30 UTC
*** Bug 2000309 has been marked as a duplicate of this bug. ***

Comment 28 Anping Li 2022-01-12 15:03:56 UTC
Verified on csv: elasticsearch-operator.4.6.0-202201111633.  image: openshift/ose-elasticsearch-proxy/images/v4.6.0-202201111633.p0.g12d80b2.assembly.stream

Comment 33 errata-xmlrpc 2022-01-27 08:26:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.54 extras and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0181