1890838 – Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get"

Bug 1890838 - Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get"

Summary: Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception in...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jeff Cantrill
QA Contact:	Qiaoling Tang
Docs Contact:
URL:
Whiteboard:	logging-exploration
Duplicates (4):	1896578 1899431 1900873 1902095 (view as bug list)
Depends On:
Blocks:	1916910 1928772
TreeView+	depends on / blocked

Reported:	2020-10-23 03:14 UTC by Qiaoling Tang
Modified:	2024-10-01 16:59 UTC (History)
CC List:	38 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1916910 (view as bug list)
Environment:
Last Closed:	2021-02-24 11:21:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift elasticsearch-operator pull 599	None	closed	Bug 1890838: Condense indexmanagement into single cronjob	2021-02-17 04:35:05 UTC
Red Hat Knowledge Base (Solution)	5410091	None	None	None	2020-11-30 17:24:57 UTC
Red Hat Product Errata	RHBA-2021:0652	None	None	None	2021-02-24 11:22:12 UTC

Description Qiaoling Tang 2020-10-23 03:14:20 UTC

Description of problem:

Sometimes the elasticsearch-delete-xxx job failed:

$ oc get pod
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-7795955668-zclk4       1/1     Running     0          142m
elasticsearch-cdm-zrspl8qn-1-5799c5b4df-d6c2b   2/2     Running     0          141m
elasticsearch-cdm-zrspl8qn-2-597c79948b-xz8sp   2/2     Running     0          141m
elasticsearch-cdm-zrspl8qn-3-76fbc8b95-nnhmd    2/2     Running     0          141m
elasticsearch-delete-app-1603417500-jnlzv       0/1     Error       0          68m
elasticsearch-delete-app-1603421100-nlqdc       0/1     Completed   0          8m1s
elasticsearch-delete-audit-1603421100-gfcxc     0/1     Completed   0          8m1s
elasticsearch-delete-infra-1603421100-6z9rg     0/1     Completed   0          8m1s
elasticsearch-rollover-app-1603421100-59m8q     0/1     Completed   0          8m
elasticsearch-rollover-audit-1603421100-gvrbz   0/1     Completed   0          8m
elasticsearch-rollover-infra-1603421100-25f2j   0/1     Completed   0          8m

$ oc logs elasticsearch-delete-app-1603417500-jnlzv 
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}

ES logs:

master node:
[2020-10-23T01:45:05,725][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-audit] for index patterns [audit*]
[2020-10-23T01:45:07,362][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-operations.template.json, common.settings.operations.template.json, ocp-gen-infra], shards [3]/[1], mappings [_doc]
[2020-10-23T01:45:07,509][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-zrspl8qn-2] [app-000002] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-project.template.json, common.settings.project.template.json, ocp-gen-app], shards [3]/[1], mappings [_doc]
[2020-10-23T01:45:07,557][ERROR][c.a.o.s.f.OpenDistroSecurityFilter] [elasticsearch-cdm-zrspl8qn-2] Unexpected exception ElasticsearchSecurityException[_opendistro_security_dls_query does not match (SG 900D)]
org.elasticsearch.ElasticsearchSecurityException: _opendistro_security_dls_query does not match (SG 900D)
	at com.amazon.opendistroforelasticsearch.security.privileges.DlsFlsEvaluator.evaluate(DlsFlsEvaluator.java:133) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.privileges.PrivilegesEvaluator.evaluate(PrivilegesEvaluator.java:441) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:253) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
[2020-10-23T01:45:07,777][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-app] for index patterns [app*]
[2020-10-23T01:45:07,813][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:07,817][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:07,845][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-infra] for index patterns [infra*]
[2020-10-23T01:45:07,898][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-audit] for index patterns [audit*]
[2020-10-23T01:45:08,271][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:08,391][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:08,455][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:08,603][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [app-000002/deMFvGAJSB6VaW3BwyCvew] update_mapping [_doc]
[2020-10-23T01:45:08,668][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-cdm-zrspl8qn-2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[infra-000006][2]] ...]).

non-master node:
[2020-10-23T00:33:20,948][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-zrspl8qn-3] Node 'elasticsearch-cdm-zrspl8qn-3' initialized
[2020-10-23T01:45:07,575][WARN ][r.suppressed             ] [elasticsearch-cdm-zrspl8qn-3] path: /app-*/_alias/app-write, params: {name=app-write, index=app-*}
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-zrspl8qn-2][10.131.0.20:9300][indices:admin/aliases/get]
Caused by: org.elasticsearch.ElasticsearchSecurityException: Unexpected exception indices:admin/aliases/get
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:274) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) ~[?:?]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]

log store configuration:
    logStore:
      elasticsearch:
        nodeCount: 3
        redundancyPolicy: SingleRedundancy
        resources:
          requests:
            memory: 2Gi
        storage:
          size: 20Gi
          storageClassName: gp2
      retentionPolicy:
        application:
          maxAge: 1d
        audit:
          maxAge: 2w
        infra:
          maxAge: 3h
      type: elasticsearch


Version-Release number of selected component (if applicable):
elasticsearch-operator.4.6.0-202010221124.p0

How reproducible:
Not always, I had hit this issue several days ago: https://bugzilla.redhat.com/show_bug.cgi?id=1866019#c35

Steps to Reproduce:
1. deploy logging
2. keep logging stack running for several hours
3. check pods

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Cantrill 2020-10-26 12:57:23 UTC

Is this the oldest cron job?  Is it possible this is a transient issue from when the cluster is starting and the permissions have not been seeded yet?

Comment 2 Qiaoling Tang 2020-10-27 00:29:59 UTC

(In reply to Jeff Cantrill from comment #1)
> Is this the oldest cron job?  

Look at the `AGE` field, seems no.

> Is it possible this is a transient issue from
> when the cluster is starting and the permissions have not been seeded yet?

I think no, I checked the logs in the ES cluster, the cluster already started before the issue happened.

Comment 4 Jeff Cantrill 2020-11-23 21:33:21 UTC

*** Bug 1899431 has been marked as a duplicate of this bug. ***

Comment 5 Jeff Cantrill 2020-11-23 21:35:05 UTC

*** Bug 1900873 has been marked as a duplicate of this bug. ***

Comment 6 Lukas Vlcek 2020-11-26 15:41:04 UTC

There is a pending PR https://github.com/openshift/origin-aggregated-logging/pull/2023 and related BZ https://bugzilla.redhat.com/show_bug.cgi?id=1900757 that fixes this for cluster admins.

Comment 7 Lukas Vlcek 2020-11-26 15:44:05 UTC

Oops, sorry about the comment #6 it is probably not accurate (I hit return too fast).
The fact is that there are some BZs that might be related, like the one mentioned before or https://bugzilla.redhat.com/show_bug.cgi?id=1900772 (which is the same but for 4.7) but I am not really sure if this is a duplicated BZ because if it were there couldn't be the "sometimes" in the description. There would have to be "always".

Comment 8 Periklis Tsirakidis 2020-11-27 08:18:52 UTC

@lvlcek 

Please consider that any solution provided needs a backport to 4.6.z and 4.5.z because there are couple of customer cases in [1] by IBM.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896578

Comment 9 Periklis Tsirakidis 2020-11-30 17:24:58 UTC

*** Bug 1896578 has been marked as a duplicate of this bug. ***

Comment 10 Periklis Tsirakidis 2020-11-30 17:25:35 UTC

*** Bug 1902095 has been marked as a duplicate of this bug. ***

Comment 19 Anand Paladugu 2021-01-06 03:57:20 UTC

I have another OCP 4.6 customer seeing this and it started happening suddenly.   the later errors (index read-only) is likely since the ES made the indices read-only due to space running out as deletes are not working.  I am checking on it.


Any update on this BZ or workaround for this issue?



$ oc logs elasticsearch-delete-audit-1608591600-mqf8c

deleting indices: audit-003095,audit-003094
{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "Unexpected exception indices:admin/delete"
      }
    ],
    "type" : "security_exception",
    "reason" : "Unexpected exception indices:admin/delete"
  },
  "status" : 500
}

$ oc logs elasticsearch-rollover-app-1608591600-7pk4l
{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

$ oc logs elasticsearch-rollover-audit-1608591600-llsvq
{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

$ oc logs elasticsearch-rollover-infra-1608591600-p5xjn
{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

Comment 21 Anand Paladugu 2021-01-07 16:24:50 UTC

Hi

Any workaround that might work to help customers in the interim?

Thanks

Anand

Comment 22 Lukas Vlcek 2021-01-08 15:51:52 UTC

Update: We are still investigating fix and workaround for this BZ.

Further, I want to clarify and follow up on my comment #6:
The https://github.com/openshift/origin-aggregated-logging/pull/2023 is not related and does not solve this BZ.

Comment 24 Frederic Giloux 2021-01-15 16:05:39 UTC

Another customer facing this issue. This happens 3-4 times a week. This would not be an issue for them as indices are cleansed by subsequent runs if the KubeJobFailed alert was not raised, which requires human intervention (investigation and clearing). My understanding is that the current combination is unfortunate in this respect:
- KubeJobFailed alert is pending for 15mns before getting fired
- Jobs are run every 15mns, so that previous jobs should get deleted, hence alarms cleared before they get fired if the second run is successful
- Latest failed job is kept for debugging, which causes the alert to get triggered

Comment 27 Anping Li 2021-01-20 10:19:58 UTC

The index can't be created as no cronjob.

Comment 32 Anping Li 2021-01-21 21:26:17 UTC

Verified, the delee and rollover jobs are collapses to a single job now.

>oc get cronjob
NAME                     SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
elasticsearch-im-app     */15 * * * *   False     0        31s             24m
elasticsearch-im-audit   */15 * * * *   False     0        31s             24m
elasticsearch-im-infra   */15 * * * *   False     0        31s             24m

>oc get pods
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-579b79679-845z4        1/1     Running     0          63m
elasticsearch-cdm-tsqz59lh-1-599fd87944-s7q9q   2/2     Running     0          23m
elasticsearch-im-app-1611259200-qg2w9           0/1     Completed   0          15m
elasticsearch-im-app-1611260100-ztwmh           0/1     Completed   0          6s
elasticsearch-im-audit-1611259200-tt7m4         0/1     Completed   0          15m
elasticsearch-im-audit-1611260100-frmvt         0/1     Completed   0          6s
elasticsearch-im-infra-1611259200-8sdzq         0/1     Completed   0          15m
elasticsearch-im-infra-1611260100-v5fdm         0/1     Completed   0          6s

>oc logs elasticsearch-im-infra-1611260280-kczbq

deleting indices: infra-000002
Current write index for infra-write: infra-000003
Checking results from _rollover call
Next write index for infra-write: infra-000004
Checking if infra-000004 exists
Checking if infra-000004 is the write index for infra-write
Done!

Comment 33 Ronald 2021-01-27 13:40:26 UTC

Hi Guys, 

When will it be backported to version 4.6?

Thx,
Ronald

Comment 34 ewolinet 2021-01-27 22:58:51 UTC

(In reply to Ronald from comment #33)
> Hi Guys, 
> 
> When will it be backported to version 4.6?
> 
> Thx,
> Ronald

https://bugzilla.redhat.com/show_bug.cgi?id=1916910

Comment 35 hgomes 2021-02-03 17:10:47 UTC

I have a customer facing this at OCP 4.5.27

Comment 49 errata-xmlrpc 2021-02-24 11:21:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0652

Note You need to log in before you can comment on or make changes to this bug.

alchan
andcosta
anisal
anli
aos-bugs
apaladug
ChetRHosey
cruhm
dageoffr
dahernan
dkulkarn
dseals
ewolinet
hgomes
jcantril
jkaur
juherrer
kiyyappa
ksathe
lvlcek
mrdest
naoto30
naygupta
nnosenzo
ocasalsa
periklis
prdeshpa
rkant
ronald.rademaker
sauchter
shishika
sparpate
ssadhale
tmicheli
vhernand
vjaypurk
xingli
ykarajag