1985826 – The elasticsearch-im-xxx job failed when trying to start index management process for a non-existent(empty-named) index

Bug 1985826 - The elasticsearch-im-xxx job failed when trying to start index management process for a non-existent(empty-named) index

Summary: The elasticsearch-im-xxx job failed when trying to start index management pro...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.z
Assignee:	Gerard Vanloo
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:	logging-exploration
Duplicates (1):	2000309 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-07-26 01:52 UTC by yhe
Modified:	2024-12-20 20:32 UTC (History)
CC List:	32 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-27 08:26:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-proxy pull 96	None	open	[release-4.6] Bug 1985826: Only applying default role when no other roles are given	2021-12-16 17:32:52 UTC
Red Hat Knowledge Base (Solution)	6376201	None	None	None	2021-10-01 04:20:24 UTC
Red Hat Product Errata	RHSA-2022:0181	None	None	None	2022-01-27 08:26:18 UTC

Description yhe 2021-07-26 01:52:31 UTC

Description of problem:
The elasticsearch-im-xxx job failed when trying to start the index management process for a non-existent(empty-named) index.

The following logs can be seen in the logs of the elasticsearch-im-xxx-xxxxx pods that are in Error status.

2021-07-25T05:15:12.824232244Z Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-07-25T05:15:12.824232244Z 
2021-07-25T05:15:12.839052478Z Received an empty response from elasticsearch -- server may not be ready

And also:

2021-07-25T20:45:15.629014714Z Index management rollover process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-07-25T20:45:15.629014714Z 
2021-07-25T20:45:15.640236263Z Received an empty response from elasticsearch -- server may not be ready

while the normal logs should be like:

2021-07-25T05:15:12.518377608Z Index management delete process starting for infra
2021-07-25T05:15:12.518377608Z 
2021-07-25T05:15:12.714427228Z deleting indices: infra-003321
2021-07-25T05:15:12.817929576Z Done!

2021-07-25T05:15:12.892345720Z Index management rollover process starting for infra
2021-07-25T05:15:12.892345720Z 
2021-07-25T05:15:12.978155500Z Current write index for infra-write: infra-003341
2021-07-25T05:15:13.029990544Z Checking results from _rollover call
2021-07-25T05:15:13.070198489Z Next write index for infra-write: infra-003341
2021-07-25T05:15:13.070198489Z Checking if infra-003341 exists
2021-07-25T05:15:13.116276007Z Checking if infra-003341 is the write index for infra-write
2021-07-25T05:15:13.194983109Z Done!

Will upload other related information later.

Version-Release number of selected component (if applicable):

How reproducible:
Sometimes

Steps to Reproduce:
1. Check the elasticsearch-im-xxx job's status
2. Find the failed elasticsearch-im-xxx job
3. Check the logs of the elasticsearch-im-xxx-xxxxx pod

Actual results:
The above error can be seen in the logs

Expected results:
The above error doesn't happen and the job was completed correctly

Additional info:

Comment 3 bmehra 2021-08-18 10:56:47 UTC

Facing similar issue in openshift 4.7 as mentioned in the description.

Below are the logs of failed jobs of Elasticsearch 

$  omg logs elasticsearch-im-app-1628712900-8nsjt

++++
2021-08-11T20:15:19.515097868Z Index management delete process starting for {"code":500,"message":"Internal
2021-08-11T20:15:19.515097868Z 
2021-08-11T20:15:19.524015986Z Received an empty response from elasticsearch -- server may not be ready
++++

$ omg logs elasticsearch-im-infra-1628748000-wv7gf

++++
2021-08-12T06:00:12.478110967Z Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-08-12T06:00:12.478178196Z 
2021-08-12T06:00:12.488256306Z Received an empty response from elasticsearch -- server may not be ready
++++


If you need any more information, please let me know.

Regards,
Bobby Mehra 
Red Hat.

Comment 4 bmehra 2021-08-18 10:57:34 UTC

Facing similar issue in openshift 4.7 as mentioned in the description.

Below are the logs of failed jobs of Elasticsearch 

$  omg logs elasticsearch-im-app-1628712900-8nsjt

++++
2021-08-11T20:15:19.515097868Z Index management delete process starting for {"code":500,"message":"Internal
2021-08-11T20:15:19.515097868Z 
2021-08-11T20:15:19.524015986Z Received an empty response from elasticsearch -- server may not be ready
++++

$ omg logs elasticsearch-im-infra-1628748000-wv7gf

++++
2021-08-12T06:00:12.478110967Z Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query
2021-08-12T06:00:12.478178196Z 
2021-08-12T06:00:12.488256306Z Received an empty response from elasticsearch -- server may not be ready
++++


If you need any more information, please let me know.

Regards,
Bobby Mehra 
Red Hat.

Comment 5 Jace Liang 2021-08-19 07:48:19 UTC

Facing similar issue in openshift 4.7.9, using ipi on aws.
Cluster Logging 5.1.0-75 
OpenShift Elasticsearch Operator 5.1.0-96

Log of elasticsearch-im-infra-1629355500-2flpr

========================
Index management delete process starting for infra

No indices to delete
========================
Index management rollover process starting for infra

Current write index for infra-write: infra-000493
Checking results from _rollover call
Next write index for infra-write: infra-000494
Checking if infra-000494 exists
Checking if infra-000494 is the write index for infra-write
Done!
========================
Index management rollover process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query

Received an empty response from elasticsearch -- server may not be ready

Comment 6 aharchin 2021-08-23 10:50:44 UTC


Facing a similar issue in OpenShift 4.8.4
Cluster Logging 5.1.0-75 
OpenShift Elasticsearch Operator 5.1.0-96

**app**

~~~
2021-08-18T07:45:02.880651084Z ========================
2021-08-18T07:45:02.880651084Z Index management delete process starting for app
2021-08-18T07:45:02.880651084Z 
2021-08-18T07:45:03.024691766Z No indices to delete
2021-08-18T07:45:03.079613499Z ========================
2021-08-18T07:45:03.079613499Z Index management rollover process starting for app
2021-08-18T07:45:03.079636900Z 
2021-08-18T07:45:03.157619939Z Current write index for app-write: app-001391
2021-08-18T07:45:03.174729379Z Checking results from _rollover call
2021-08-18T07:45:03.174785694Z Calculating next write index based on current write index...
2021-08-18T07:45:03.185094156Z Next write index for app-write: app-001392
2021-08-18T07:45:03.185094156Z Checking if app-001392 exists
2021-08-18T07:45:03.446187155Z {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"app-001392","index_uuid":"_na_","index":"app-001392"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"app-001392","index_uuid":"_na_","index":"app-001392"},"status":404}
~~~

------------------------------------------------
**audit**

~~~
2021-08-18T11:15:03.170981328Z ========================
2021-08-18T11:15:03.170981328Z Index management delete process starting for Received
2021-08-18T11:15:03.170981328Z 
2021-08-18T11:15:03.273149997Z Error while attemping to determine the active write alias: {'error': 'alias [Received-write] missing', 'status': 404}
2021-08-18T11:15:03.356975699Z ========================
2021-08-18T11:15:03.356975699Z Index management rollover process starting for audit
2021-08-18T11:15:03.356975699Z 
2021-08-18T11:15:03.428212646Z Current write index for audit-write: audit-001393
2021-08-18T11:15:04.067360313Z Checking results from _rollover call
2021-08-18T11:15:04.098107464Z Next write index for audit-write: audit-001394
2021-08-18T11:15:04.098107464Z Checking if audit-001394 exists
2021-08-18T11:15:04.136677540Z Checking if audit-001394 is the write index for audit-write
2021-08-18T11:15:04.203876863Z Done!
~~~

-------------------------------------------------
**infra**

~~~
2021-08-18T10:30:03.390222987Z ========================
2021-08-18T10:30:03.390222987Z Index management delete process starting for infra
2021-08-18T10:30:03.390222987Z 
2021-08-18T10:30:03.408917414Z Received an empty response from elasticsearch -- server may not be ready
~~~

Pods status:

~~~
$ oc -n openshift-logging get po  --field-selector=status.phase==Failed
NAME                                    READY   STATUS   RESTARTS   AGE
elasticsearch-im-app-27155985-xxxx     0/1     Error    0          14m
elasticsearch-im-audit-27155820-xxxx   0/1     Error    0          179m
elasticsearch-im-infra-27155310-xxxx   0/1     Error    0          11h
~~~

Comment 9 Aleksey Usov 2021-09-07 21:37:41 UTC

Seeing the same issue in 4.7.21:

# oc logs elasticsearch-im-audit-1631027700-8gqhd
========================
Index management delete process starting for audit

No indices to delete
========================
Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query

Received an empty response from elasticsearch -- server may not be ready
========================
Index management rollover process starting for audit

Current write index for audit-write: audit-000011
Checking results from _rollover call
Next write index for audit-write: audit-000011
Checking if audit-000011 exists
Checking if audit-000011 is the write index for audit-write
Done!

# oc get csv
NAME                               DISPLAY                            VERSION    REPLACES                           PHASE
cluster-logging.5.2.0-57           Red Hat OpenShift Logging          5.2.0-57   cluster-logging.5.1.0-75           Succeeded
elasticsearch-operator.5.2.0-58    OpenShift Elasticsearch Operator   5.2.0-58   elasticsearch-operator.5.1.0-96    Succeeded

Comment 10 yener 2021-09-14 07:29:37 UTC

Facing the similar issue in 4.7.24.
Cluster Logging 5.1.0-75 
OpenShift Elasticsearch Operator 5.1.0-96

$ oc logs elasticsearch-im-audit-1631232000-8mx4p
========================
Index management delete process starting for audit

No indices to delete
========================
Index management delete process starting for {"error":{"root_cause":[{"type":"security_exception","reason":"_opendistro_security_dls_query

Received an empty response from elasticsearch -- server may not be ready
========================
Index management rollover process starting for audit

Current write index for audit-write: audit-000590
Checking results from _rollover call
Next write index for audit-write: audit-000590
Checking if audit-000590 exists
Checking if audit-000590 is the write index for audit-write
Done!

Regards,
Yener Yumlu

Comment 16 David Hernández Fernández 2021-10-11 13:54:20 UTC

Still facing the issue and able to be reproduced in latest Cluster Logging version and respective latest OpenShift Elasticsearch Operator.

Comment 24 Sashank Agarwal 2021-11-18 13:53:30 UTC

*** Bug 2000309 has been marked as a duplicate of this bug. ***

Comment 28 Anping Li 2022-01-12 15:03:56 UTC

Verified on csv: elasticsearch-operator.4.6.0-202201111633.  image: openshift/ose-elasticsearch-proxy/images/v4.6.0-202201111633.p0.g12d80b2.assembly.stream

Comment 33 errata-xmlrpc 2022-01-27 08:26:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.6.54 extras and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0181

Note You need to log in before you can comment on or make changes to this bug.

ableisch
achakrat
afurbach
aharchin
aos-bugs
asheth
ausov
bmehra
ChetRHosey
chrzhang
cruhm
ctakahas
dahernan
dgautam
dkulkarn
gvanloo
ivan.wong
jaliang
jcall
jcantril
kiyyappa
mchebbi
mmarkand
morgan.peterman
mzali
ncarmich
ocasalsa
periklis
pkhaire
pwaghmod
sascha.gruen
yener.yumlu