Bug 1890838 - Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception indices:admin/aliases/get"
Summary: Sometimes the elasticsearch-delete-xxx job failed at "Unexpected exception in...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.7.0
Assignee: Jeff Cantrill
QA Contact: Qiaoling Tang
URL:
Whiteboard: logging-exploration
: 1896578 1899431 1900873 1902095 (view as bug list)
Depends On:
Blocks: 1916910 1928772
TreeView+ depends on / blocked
 
Reported: 2020-10-23 03:14 UTC by Qiaoling Tang
Modified: 2024-10-01 16:59 UTC (History)
38 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1916910 (view as bug list)
Environment:
Last Closed: 2021-02-24 11:21:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 599 0 None closed Bug 1890838: Condense indexmanagement into single cronjob 2021-02-17 04:35:05 UTC
Red Hat Knowledge Base (Solution) 5410091 0 None None None 2020-11-30 17:24:57 UTC
Red Hat Product Errata RHBA-2021:0652 0 None None None 2021-02-24 11:22:12 UTC

Description Qiaoling Tang 2020-10-23 03:14:20 UTC
Description of problem:

Sometimes the elasticsearch-delete-xxx job failed:

$ oc get pod
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-7795955668-zclk4       1/1     Running     0          142m
elasticsearch-cdm-zrspl8qn-1-5799c5b4df-d6c2b   2/2     Running     0          141m
elasticsearch-cdm-zrspl8qn-2-597c79948b-xz8sp   2/2     Running     0          141m
elasticsearch-cdm-zrspl8qn-3-76fbc8b95-nnhmd    2/2     Running     0          141m
elasticsearch-delete-app-1603417500-jnlzv       0/1     Error       0          68m
elasticsearch-delete-app-1603421100-nlqdc       0/1     Completed   0          8m1s
elasticsearch-delete-audit-1603421100-gfcxc     0/1     Completed   0          8m1s
elasticsearch-delete-infra-1603421100-6z9rg     0/1     Completed   0          8m1s
elasticsearch-rollover-app-1603421100-59m8q     0/1     Completed   0          8m
elasticsearch-rollover-audit-1603421100-gvrbz   0/1     Completed   0          8m
elasticsearch-rollover-infra-1603421100-25f2j   0/1     Completed   0          8m

$ oc logs elasticsearch-delete-app-1603417500-jnlzv 
{"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}
Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500}

ES logs:

master node:
[2020-10-23T01:45:05,725][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-audit] for index patterns [audit*]
[2020-10-23T01:45:07,362][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-operations.template.json, common.settings.operations.template.json, ocp-gen-infra], shards [3]/[1], mappings [_doc]
[2020-10-23T01:45:07,509][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-zrspl8qn-2] [app-000002] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-project.template.json, common.settings.project.template.json, ocp-gen-app], shards [3]/[1], mappings [_doc]
[2020-10-23T01:45:07,557][ERROR][c.a.o.s.f.OpenDistroSecurityFilter] [elasticsearch-cdm-zrspl8qn-2] Unexpected exception ElasticsearchSecurityException[_opendistro_security_dls_query does not match (SG 900D)]
org.elasticsearch.ElasticsearchSecurityException: _opendistro_security_dls_query does not match (SG 900D)
	at com.amazon.opendistroforelasticsearch.security.privileges.DlsFlsEvaluator.evaluate(DlsFlsEvaluator.java:133) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.privileges.PrivilegesEvaluator.evaluate(PrivilegesEvaluator.java:441) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:253) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
[2020-10-23T01:45:07,777][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-app] for index patterns [app*]
[2020-10-23T01:45:07,813][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:07,817][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:07,845][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-infra] for index patterns [infra*]
[2020-10-23T01:45:07,898][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-audit] for index patterns [audit*]
[2020-10-23T01:45:08,271][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:08,391][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:08,455][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc]
[2020-10-23T01:45:08,603][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [app-000002/deMFvGAJSB6VaW3BwyCvew] update_mapping [_doc]
[2020-10-23T01:45:08,668][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-cdm-zrspl8qn-2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[infra-000006][2]] ...]).

non-master node:
[2020-10-23T00:33:20,948][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-zrspl8qn-3] Node 'elasticsearch-cdm-zrspl8qn-3' initialized
[2020-10-23T01:45:07,575][WARN ][r.suppressed             ] [elasticsearch-cdm-zrspl8qn-3] path: /app-*/_alias/app-write, params: {name=app-write, index=app-*}
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-zrspl8qn-2][10.131.0.20:9300][indices:admin/aliases/get]
Caused by: org.elasticsearch.ElasticsearchSecurityException: Unexpected exception indices:admin/aliases/get
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:274) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) ~[?:?]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]

log store configuration:
    logStore:
      elasticsearch:
        nodeCount: 3
        redundancyPolicy: SingleRedundancy
        resources:
          requests:
            memory: 2Gi
        storage:
          size: 20Gi
          storageClassName: gp2
      retentionPolicy:
        application:
          maxAge: 1d
        audit:
          maxAge: 2w
        infra:
          maxAge: 3h
      type: elasticsearch


Version-Release number of selected component (if applicable):
elasticsearch-operator.4.6.0-202010221124.p0

How reproducible:
Not always, I had hit this issue several days ago: https://bugzilla.redhat.com/show_bug.cgi?id=1866019#c35

Steps to Reproduce:
1. deploy logging
2. keep logging stack running for several hours
3. check pods

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Cantrill 2020-10-26 12:57:23 UTC
Is this the oldest cron job?  Is it possible this is a transient issue from when the cluster is starting and the permissions have not been seeded yet?

Comment 2 Qiaoling Tang 2020-10-27 00:29:59 UTC
(In reply to Jeff Cantrill from comment #1)
> Is this the oldest cron job?  

Look at the `AGE` field, seems no.

> Is it possible this is a transient issue from
> when the cluster is starting and the permissions have not been seeded yet?

I think no, I checked the logs in the ES cluster, the cluster already started before the issue happened.

Comment 4 Jeff Cantrill 2020-11-23 21:33:21 UTC
*** Bug 1899431 has been marked as a duplicate of this bug. ***

Comment 5 Jeff Cantrill 2020-11-23 21:35:05 UTC
*** Bug 1900873 has been marked as a duplicate of this bug. ***

Comment 6 Lukas Vlcek 2020-11-26 15:41:04 UTC
There is a pending PR https://github.com/openshift/origin-aggregated-logging/pull/2023 and related BZ https://bugzilla.redhat.com/show_bug.cgi?id=1900757 that fixes this for cluster admins.

Comment 7 Lukas Vlcek 2020-11-26 15:44:05 UTC
Oops, sorry about the comment #6 it is probably not accurate (I hit return too fast).
The fact is that there are some BZs that might be related, like the one mentioned before or https://bugzilla.redhat.com/show_bug.cgi?id=1900772 (which is the same but for 4.7) but I am not really sure if this is a duplicated BZ because if it were there couldn't be the "sometimes" in the description. There would have to be "always".

Comment 8 Periklis Tsirakidis 2020-11-27 08:18:52 UTC
@lvlcek 

Please consider that any solution provided needs a backport to 4.6.z and 4.5.z because there are couple of customer cases in [1] by IBM.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1896578

Comment 9 Periklis Tsirakidis 2020-11-30 17:24:58 UTC
*** Bug 1896578 has been marked as a duplicate of this bug. ***

Comment 10 Periklis Tsirakidis 2020-11-30 17:25:35 UTC
*** Bug 1902095 has been marked as a duplicate of this bug. ***

Comment 19 Anand Paladugu 2021-01-06 03:57:20 UTC
I have another OCP 4.6 customer seeing this and it started happening suddenly.   the later errors (index read-only) is likely since the ES made the indices read-only due to space running out as deletes are not working.  I am checking on it.


Any update on this BZ or workaround for this issue?



$ oc logs elasticsearch-delete-audit-1608591600-mqf8c

deleting indices: audit-003095,audit-003094
{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "Unexpected exception indices:admin/delete"
      }
    ],
    "type" : "security_exception",
    "reason" : "Unexpected exception indices:admin/delete"
  },
  "status" : 500
}

$ oc logs elasticsearch-rollover-app-1608591600-7pk4l
{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

$ oc logs elasticsearch-rollover-audit-1608591600-llsvq
{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

$ oc logs elasticsearch-rollover-infra-1608591600-p5xjn
{
  "error" : {
    "root_cause" : [
      {
        "type" : "cluster_block_exception",
        "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
      }
    ],
    "type" : "cluster_block_exception",
    "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"
  },
  "status" : 403
}

Comment 21 Anand Paladugu 2021-01-07 16:24:50 UTC
Hi

Any workaround that might work to help customers in the interim?

Thanks

Anand

Comment 22 Lukas Vlcek 2021-01-08 15:51:52 UTC
Update: We are still investigating fix and workaround for this BZ.

Further, I want to clarify and follow up on my comment #6:
The https://github.com/openshift/origin-aggregated-logging/pull/2023 is not related and does not solve this BZ.

Comment 24 Frederic Giloux 2021-01-15 16:05:39 UTC
Another customer facing this issue. This happens 3-4 times a week. This would not be an issue for them as indices are cleansed by subsequent runs if the KubeJobFailed alert was not raised, which requires human intervention (investigation and clearing). My understanding is that the current combination is unfortunate in this respect:
- KubeJobFailed alert is pending for 15mns before getting fired
- Jobs are run every 15mns, so that previous jobs should get deleted, hence alarms cleared before they get fired if the second run is successful
- Latest failed job is kept for debugging, which causes the alert to get triggered

Comment 27 Anping Li 2021-01-20 10:19:58 UTC
The index can't be created as no cronjob.

Comment 32 Anping Li 2021-01-21 21:26:17 UTC
Verified, the delee and rollover jobs are collapses to a single job now.

>oc get cronjob
NAME                     SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
elasticsearch-im-app     */15 * * * *   False     0        31s             24m
elasticsearch-im-audit   */15 * * * *   False     0        31s             24m
elasticsearch-im-infra   */15 * * * *   False     0        31s             24m

>oc get pods
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-579b79679-845z4        1/1     Running     0          63m
elasticsearch-cdm-tsqz59lh-1-599fd87944-s7q9q   2/2     Running     0          23m
elasticsearch-im-app-1611259200-qg2w9           0/1     Completed   0          15m
elasticsearch-im-app-1611260100-ztwmh           0/1     Completed   0          6s
elasticsearch-im-audit-1611259200-tt7m4         0/1     Completed   0          15m
elasticsearch-im-audit-1611260100-frmvt         0/1     Completed   0          6s
elasticsearch-im-infra-1611259200-8sdzq         0/1     Completed   0          15m
elasticsearch-im-infra-1611260100-v5fdm         0/1     Completed   0          6s

>oc logs elasticsearch-im-infra-1611260280-kczbq

deleting indices: infra-000002
Current write index for infra-write: infra-000003
Checking results from _rollover call
Next write index for infra-write: infra-000004
Checking if infra-000004 exists
Checking if infra-000004 is the write index for infra-write
Done!

Comment 33 Ronald 2021-01-27 13:40:26 UTC
Hi Guys, 

When will it be backported to version 4.6?

Thx,
Ronald

Comment 34 ewolinet 2021-01-27 22:58:51 UTC
(In reply to Ronald from comment #33)
> Hi Guys, 
> 
> When will it be backported to version 4.6?
> 
> Thx,
> Ronald

https://bugzilla.redhat.com/show_bug.cgi?id=1916910

Comment 35 hgomes 2021-02-03 17:10:47 UTC
I have a customer facing this at OCP 4.5.27

Comment 49 errata-xmlrpc 2021-02-24 11:21:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0652


Note You need to log in before you can comment on or make changes to this bug.