Description of problem: Sometimes the elasticsearch-delete-xxx job failed: $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-7795955668-zclk4 1/1 Running 0 142m elasticsearch-cdm-zrspl8qn-1-5799c5b4df-d6c2b 2/2 Running 0 141m elasticsearch-cdm-zrspl8qn-2-597c79948b-xz8sp 2/2 Running 0 141m elasticsearch-cdm-zrspl8qn-3-76fbc8b95-nnhmd 2/2 Running 0 141m elasticsearch-delete-app-1603417500-jnlzv 0/1 Error 0 68m elasticsearch-delete-app-1603421100-nlqdc 0/1 Completed 0 8m1s elasticsearch-delete-audit-1603421100-gfcxc 0/1 Completed 0 8m1s elasticsearch-delete-infra-1603421100-6z9rg 0/1 Completed 0 8m1s elasticsearch-rollover-app-1603421100-59m8q 0/1 Completed 0 8m elasticsearch-rollover-audit-1603421100-gvrbz 0/1 Completed 0 8m elasticsearch-rollover-infra-1603421100-25f2j 0/1 Completed 0 8m $ oc logs elasticsearch-delete-app-1603417500-jnlzv {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} ES logs: master node: [2020-10-23T01:45:05,725][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-audit] for index patterns [audit*] [2020-10-23T01:45:07,362][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-operations.template.json, common.settings.operations.template.json, ocp-gen-infra], shards [3]/[1], mappings [_doc] [2020-10-23T01:45:07,509][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-zrspl8qn-2] [app-000002] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-project.template.json, common.settings.project.template.json, ocp-gen-app], shards [3]/[1], mappings [_doc] [2020-10-23T01:45:07,557][ERROR][c.a.o.s.f.OpenDistroSecurityFilter] [elasticsearch-cdm-zrspl8qn-2] Unexpected exception ElasticsearchSecurityException[_opendistro_security_dls_query does not match (SG 900D)] org.elasticsearch.ElasticsearchSecurityException: _opendistro_security_dls_query does not match (SG 900D) at com.amazon.opendistroforelasticsearch.security.privileges.DlsFlsEvaluator.evaluate(DlsFlsEvaluator.java:133) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.privileges.PrivilegesEvaluator.evaluate(PrivilegesEvaluator.java:441) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:253) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1] at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1] at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] [2020-10-23T01:45:07,777][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-app] for index patterns [app*] [2020-10-23T01:45:07,813][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc] [2020-10-23T01:45:07,817][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc] [2020-10-23T01:45:07,845][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-infra] for index patterns [infra*] [2020-10-23T01:45:07,898][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-zrspl8qn-2] adding template [ocp-gen-audit] for index patterns [audit*] [2020-10-23T01:45:08,271][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc] [2020-10-23T01:45:08,391][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc] [2020-10-23T01:45:08,455][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [infra-000006/jzRO2JwGROSFd8NGcZAT0A] update_mapping [_doc] [2020-10-23T01:45:08,603][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-zrspl8qn-2] [app-000002/deMFvGAJSB6VaW3BwyCvew] update_mapping [_doc] [2020-10-23T01:45:08,668][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-cdm-zrspl8qn-2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[infra-000006][2]] ...]). non-master node: [2020-10-23T00:33:20,948][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-zrspl8qn-3] Node 'elasticsearch-cdm-zrspl8qn-3' initialized [2020-10-23T01:45:07,575][WARN ][r.suppressed ] [elasticsearch-cdm-zrspl8qn-3] path: /app-*/_alias/app-write, params: {name=app-write, index=app-*} org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-zrspl8qn-2][10.131.0.20:9300][indices:admin/aliases/get] Caused by: org.elasticsearch.ElasticsearchSecurityException: Unexpected exception indices:admin/aliases/get at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:274) ~[?:?] at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) ~[?:?] at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) ~[?:?] at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) ~[?:?] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) ~[?:?] at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) ~[?:?] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] log store configuration: logStore: elasticsearch: nodeCount: 3 redundancyPolicy: SingleRedundancy resources: requests: memory: 2Gi storage: size: 20Gi storageClassName: gp2 retentionPolicy: application: maxAge: 1d audit: maxAge: 2w infra: maxAge: 3h type: elasticsearch Version-Release number of selected component (if applicable): elasticsearch-operator.4.6.0-202010221124.p0 How reproducible: Not always, I had hit this issue several days ago: https://bugzilla.redhat.com/show_bug.cgi?id=1866019#c35 Steps to Reproduce: 1. deploy logging 2. keep logging stack running for several hours 3. check pods Actual results: Expected results: Additional info:
Is this the oldest cron job? Is it possible this is a transient issue from when the cluster is starting and the permissions have not been seeded yet?
(In reply to Jeff Cantrill from comment #1) > Is this the oldest cron job? Look at the `AGE` field, seems no. > Is it possible this is a transient issue from > when the cluster is starting and the permissions have not been seeded yet? I think no, I checked the logs in the ES cluster, the cluster already started before the issue happened.
*** Bug 1899431 has been marked as a duplicate of this bug. ***
*** Bug 1900873 has been marked as a duplicate of this bug. ***
There is a pending PR https://github.com/openshift/origin-aggregated-logging/pull/2023 and related BZ https://bugzilla.redhat.com/show_bug.cgi?id=1900757 that fixes this for cluster admins.
Oops, sorry about the comment #6 it is probably not accurate (I hit return too fast). The fact is that there are some BZs that might be related, like the one mentioned before or https://bugzilla.redhat.com/show_bug.cgi?id=1900772 (which is the same but for 4.7) but I am not really sure if this is a duplicated BZ because if it were there couldn't be the "sometimes" in the description. There would have to be "always".
@lvlcek Please consider that any solution provided needs a backport to 4.6.z and 4.5.z because there are couple of customer cases in [1] by IBM. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1896578
*** Bug 1896578 has been marked as a duplicate of this bug. ***
*** Bug 1902095 has been marked as a duplicate of this bug. ***
I have another OCP 4.6 customer seeing this and it started happening suddenly. the later errors (index read-only) is likely since the ES made the indices read-only due to space running out as deletes are not working. I am checking on it. Any update on this BZ or workaround for this issue? $ oc logs elasticsearch-delete-audit-1608591600-mqf8c deleting indices: audit-003095,audit-003094 { "error" : { "root_cause" : [ { "type" : "security_exception", "reason" : "Unexpected exception indices:admin/delete" } ], "type" : "security_exception", "reason" : "Unexpected exception indices:admin/delete" }, "status" : 500 } $ oc logs elasticsearch-rollover-app-1608591600-7pk4l { "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" } ], "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" }, "status" : 403 } $ oc logs elasticsearch-rollover-audit-1608591600-llsvq { "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" } ], "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" }, "status" : 403 } $ oc logs elasticsearch-rollover-infra-1608591600-p5xjn { "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" } ], "type" : "cluster_block_exception", "reason" : "blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];" }, "status" : 403 }
Hi Any workaround that might work to help customers in the interim? Thanks Anand
Update: We are still investigating fix and workaround for this BZ. Further, I want to clarify and follow up on my comment #6: The https://github.com/openshift/origin-aggregated-logging/pull/2023 is not related and does not solve this BZ.
Another customer facing this issue. This happens 3-4 times a week. This would not be an issue for them as indices are cleansed by subsequent runs if the KubeJobFailed alert was not raised, which requires human intervention (investigation and clearing). My understanding is that the current combination is unfortunate in this respect: - KubeJobFailed alert is pending for 15mns before getting fired - Jobs are run every 15mns, so that previous jobs should get deleted, hence alarms cleared before they get fired if the second run is successful - Latest failed job is kept for debugging, which causes the alert to get triggered
The index can't be created as no cronjob.
Verified, the delee and rollover jobs are collapses to a single job now. >oc get cronjob NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE elasticsearch-im-app */15 * * * * False 0 31s 24m elasticsearch-im-audit */15 * * * * False 0 31s 24m elasticsearch-im-infra */15 * * * * False 0 31s 24m >oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-579b79679-845z4 1/1 Running 0 63m elasticsearch-cdm-tsqz59lh-1-599fd87944-s7q9q 2/2 Running 0 23m elasticsearch-im-app-1611259200-qg2w9 0/1 Completed 0 15m elasticsearch-im-app-1611260100-ztwmh 0/1 Completed 0 6s elasticsearch-im-audit-1611259200-tt7m4 0/1 Completed 0 15m elasticsearch-im-audit-1611260100-frmvt 0/1 Completed 0 6s elasticsearch-im-infra-1611259200-8sdzq 0/1 Completed 0 15m elasticsearch-im-infra-1611260100-v5fdm 0/1 Completed 0 6s >oc logs elasticsearch-im-infra-1611260280-kczbq deleting indices: infra-000002 Current write index for infra-write: infra-000003 Checking results from _rollover call Next write index for infra-write: infra-000004 Checking if infra-000004 exists Checking if infra-000004 is the write index for infra-write Done!
Hi Guys, When will it be backported to version 4.6? Thx, Ronald
(In reply to Ronald from comment #33) > Hi Guys, > > When will it be backported to version 4.6? > > Thx, > Ronald https://bugzilla.redhat.com/show_bug.cgi?id=1916910
I have a customer facing this at OCP 4.5.27
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652