Description of problem: Newly installed cluster-logging instance on 4.5 utilizing the log retention/rollover setup, the -rollover- and -delete- pods enter Error state. Version-Release number of selected component (if applicable): RHOCP 4.5.2 EFK 4.5.0-202007012112.p0 How reproducible: Unconfirmed Actual results: + the log of "elasticsearch-delete-audit-1234567-xxxxx" Traceback (most recent call last): File "<string>", line 4, in <module> TypeError: 'int' object has no attribute '__getitem__' + the log of "elasticsearch-delete-infra-1234567-xxxxx" deleting indices: infra-xxxxxx { "error" : { "root_cause" : [ { "type" : "security_exception", "reason" : "Unexpected exception indices:admin/delete" } ], "type" : "security_exception", "reason" : "Unexpected exception indices:admin/delete" }, "status" : 500 } + the log of "elasticsearch-rollover-infra-1234567-xxxxx" { "error" : { "root_cause" : [ { "type" : "security_exception", "reason" : "Unexpected exception indices:admin/rollover" } ], "type" : "security_exception", "reason" : "Unexpected exception indices:admin/rollover" }, "status" : 500 }
I have same problem on my test environment. Occasionally delete pod fails with same issue. I can provide access if that would help.
Do the pods continually fail or is it only while Elasticsearch is starting, which would be expected? We leave at least one failed pod for review so there may always be an error pod in the list
*** Bug 1866963 has been marked as a duplicate of this bug. ***
I also have this problem. I see pods fail intermittently through out the day. Currently I have two delete pod failures, about 5 hours apart, and a Rollover pod failure 15 minutes after the 2nd delete pod failure. The cronjobs for these pods has them running at what looks to be 15 minute intervals (*/15 * * * *), which means the failure rate isn't particularly high but any of these failures cause concerns with the customer.
(In reply to tcort from comment #10) > I also have this problem. I see pods fail intermittently through out the > day. Currently I have two delete pod failures, about 5 hours apart, and a > Rollover pod failure 15 minutes after the 2nd delete pod failure. > > The cronjobs for these pods has them running at what looks to be 15 minute > intervals (*/15 * * * *), which means the failure rate isn't particularly > high but any of these failures cause concerns with the customer. I typically have only seen this issue when first deploying a logging cluster where the cronjobs are created before the Elasticsearch pods are accepting traffic. Please attach any logs that may be useful to help resolve the issue. It's worth trying to resolve why these failures exist, but If you have more successful jobs then failed and the data is being curated (e.g. index rollover and deletion), the priority of this issue is low.
Please attach the artifacts from the [1] along with the logs for the failed cronjobs [1] https://github.com/openshift/origin-aggregated-logging/blob/release-4.5/hack/logging-dump.sh
This is likely tangential, but looking at the artifacts, it looks like you could benefit from the latest images [1] which were released 10-Aug. I am unable to reproduce locally but I don't have the same volume of logs. I'll investigate further to see I can alter the script to provide more info [1] https://bugzilla.redhat.com/show_bug.cgi?id=1860135
One way to get the issue of __getitem__: {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","index_uuid":"_na_","resource.type":"index_or_alias","resource.id":"test","index":"test"}
Added 566 indices and came across: {"error":{"root_cause":[{"type":"too_long_frame_exception","reason":"An HTTP line is larger than 4096 bytes."}],"type":"too_long_frame_exception","reason":"An HTTP line is larger than 4096 bytes."},"status":400
(In reply to Hideshi Fukumoto from comment #15) > Hi, > Did you get any news/information from the attached logging dump ? > My customer strongly need the fix or workaround to mitigate this issue, > because > he plan to start the cluster service on September 1st. > So, he have to complete his test until August 20th at least. I would disagree this is a high issue but there is something here to fix. We need to change how we execute the delete of indices even though this looks to maybe related to an expired token. Working around the delete can be accomplished by running the following for older indices: oc exec -c elasticsearch $pod -- es_util --query=<index_name> -XDELETE You may need to ref [1] to determine which indices but this would give you and idea of the ones that would be eligible to be deleted. Also, evaluating the response code more robustly may allow us to provide better log information [1] https://github.com/openshift/elasticsearch-operator/blob/release-4.5/pkg/indexmanagement/scripts.go#L40-L57
Testing is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1869117.
Verified with elasticsearch-operator.4.6.0-202008202200.p0
Hit the issue again when testing with elasticsearch-operator.4.6.0-202008291540.p0: $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-6f94cf9f9-tgm9f 1/1 Running 0 7h52m elasticsearch-cdm-b0g2npxl-1-78b8769956-lzf9x 2/2 Running 0 66m elasticsearch-cdm-b0g2npxl-2-d9fcbd799-l5jj6 2/2 Running 0 66m elasticsearch-cdm-b0g2npxl-3-8578bd869d-lhwrc 2/2 Running 0 66m elasticsearch-delete-app-1598861700-mx89d 0/1 Completed 0 18m elasticsearch-delete-app-1598862600-phr9k 0/1 Error 0 3m19s elasticsearch-delete-audit-1598862600-9plcp 0/1 Completed 0 3m19s elasticsearch-delete-infra-1598862600-ptd5v 0/1 Completed 0 3m19s elasticsearch-rollover-app-1598862600-xctn2 0/1 Completed 0 3m19s elasticsearch-rollover-audit-1598862600-rczfr 0/1 Completed 0 3m18s elasticsearch-rollover-infra-1598862600-55fnn 0/1 Completed 0 3m18s fluentd-76c7q 1/1 Running 0 66m fluentd-7k7cj 1/1 Running 0 66m fluentd-brj6w 1/1 Running 0 66m fluentd-m9vcb 1/1 Running 0 66m fluentd-vj72r 1/1 Running 0 66m fluentd-xtzxt 1/1 Running 0 66m kibana-d85d4bb76-hf5hq 2/2 Running 0 66m $ oc logs elasticsearch-delete-app-1598862600-phr9k Traceback (most recent call last): File "<string>", line 4, in <module> TypeError: 'int' object has no attribute '__getitem__' no logs found in elasticsearch-rollover-app-1598862600-xctn2. The ES cluster is Green, and has enough disk space. There are lots of logs in the EO: {"level":"info","ts":1598862589.0276558,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"app"} {"level":"info","ts":1598862589.0835047,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"infra"} {"level":"info","ts":1598862589.13504,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"audit"} {"level":"info","ts":1598862619.0736885,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"app"} {"level":"info","ts":1598862619.122707,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"infra"} {"level":"info","ts":1598862619.16467,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"audit"} {"level":"info","ts":1598862620.095908,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"app"} {"level":"info","ts":1598862620.1660204,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"infra"} {"level":"info","ts":1598862620.226318,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:80","msg":"reconciling index management","mapping":"audit"}
Moving to UpcomingSprint awaiting for PRs to merge, etc.
Hit the issue again: $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-d487668d9-9x9nj 1/1 Running 0 74m elasticsearch-cdm-r8jw2yat-1-76f5c5b5b8-lfcr6 2/2 Running 0 73m elasticsearch-cdm-r8jw2yat-2-867ff46bd7-q6r2r 2/2 Running 0 73m elasticsearch-cdm-r8jw2yat-3-78bb547c58-2l975 2/2 Running 0 73m elasticsearch-delete-app-1600655400-mcnlr 0/1 Completed 0 23m elasticsearch-delete-app-1600656300-2tjdh 0/1 Error 0 8m22s elasticsearch-delete-audit-1600656300-7f595 0/1 Completed 0 8m22s elasticsearch-delete-infra-1600656300-mst6c 0/1 Completed 0 8m22s elasticsearch-rollover-app-1600656300-smzzj 0/1 Completed 0 8m22s elasticsearch-rollover-audit-1600656300-79bxd 0/1 Completed 0 8m22s elasticsearch-rollover-infra-1600656300-h478s 0/1 Completed 0 8m22s fluentd-48g94 1/1 Running 0 73m fluentd-6bgzc 1/1 Running 0 73m fluentd-h528z 1/1 Running 0 73m fluentd-llc65 1/1 Running 0 73m fluentd-mwng7 1/1 Running 0 73m fluentd-wcfhn 1/1 Running 0 73m kibana-66688f69db-wk47w 2/2 Running 0 72m $ oc logs elasticsearch-delete-app-1600656300-2tjdh {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} Error while attemping to determine the active write alias: {"error":{"root_cause":[{"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"}],"type":"security_exception","reason":"Unexpected exception indices:admin/aliases/get"},"status":500} $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.6.0-202009192030.p0 Cluster Logging 4.6.0-202009192030.p0 Succeeded elasticsearch-operator.4.6.0-202009192030.p0 Elasticsearch Operator 4.6.0-202009192030.p0 Succeeded Found some errors in the ES nodes: elasticsearch-cdm-r8jw2yat-2-867ff46bd7-q6r2r(the master node): [2020-09-21T02:45:04,100][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-operations.template.json, common.settings.operations.template.json, ocp-gen-infra], shards [3]/[1], mappings [_doc] [2020-09-21T02:45:04,342][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,506][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,509][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,512][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,580][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,618][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,651][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,698][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:04,784][INFO ][o.e.c.m.MetaDataCreateIndexService] [elasticsearch-cdm-r8jw2yat-2] [app-000002] creating index, cause [rollover_index], templates [com.redhat.viaq-openshift-project.template.json, common.settings.project.template.json, ocp-gen-app], shards [3]/[1], mappings [_doc] [2020-09-21T02:45:04,843][ERROR][c.a.o.s.f.OpenDistroSecurityFilter] [elasticsearch-cdm-r8jw2yat-2] Unexpected exception ElasticsearchSecurityException[_opendistro_security_dls_query does not match (SG 900D)] org.elasticsearch.ElasticsearchSecurityException: _opendistro_security_dls_query does not match (SG 900D) at com.amazon.opendistroforelasticsearch.security.privileges.DlsFlsEvaluator.evaluate(DlsFlsEvaluator.java:133) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.privileges.PrivilegesEvaluator.evaluate(PrivilegesEvaluator.java:441) ~[opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:253) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1] at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) [opendistro_security_ssl-0.10.0.3-redhat-1.jar:0.10.0.3-redhat-1] at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) [opendistro_security-0.10.1.0-redhat-1.jar:0.10.1.0-redhat-1] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] [2020-09-21T02:45:04,971][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-r8jw2yat-2] adding template [ocp-gen-app] for index patterns [app*] [2020-09-21T02:45:05,044][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-r8jw2yat-2] adding template [ocp-gen-infra] for index patterns [infra*] [2020-09-21T02:45:05,102][INFO ][o.e.c.m.MetaDataIndexTemplateService] [elasticsearch-cdm-r8jw2yat-2] adding template [ocp-gen-audit] for index patterns [audit*] [2020-09-21T02:45:05,177][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:05,480][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [infra-000005/0uS9S_4kSxuGBcrUrmJktQ] update_mapping [_doc] [2020-09-21T02:45:05,504][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-cdm-r8jw2yat-2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[app-000002][0]] ...]). [2020-09-21T02:45:06,457][INFO ][o.e.c.m.MetaDataMappingService] [elasticsearch-cdm-r8jw2yat-2] [app-000002/CTpGx2IWRC60uMz8_o0DRA] update_mapping [_doc] elasticsearch-cdm-r8jw2yat-3-78bb547c58-2l975: [2020-09-21T01:41:41,060][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-r8jw2yat-3] Node 'elasticsearch-cdm-r8jw2yat-3' initialized [2020-09-21T02:45:04,859][WARN ][r.suppressed ] [elasticsearch-cdm-r8jw2yat-3] path: /app-*/_alias/app-write, params: {name=app-write, index=app-*} org.elasticsearch.transport.RemoteTransportException: [elasticsearch-cdm-r8jw2yat-2][10.128.2.16:9300][indices:admin/aliases/get] Caused by: org.elasticsearch.ElasticsearchSecurityException: Unexpected exception indices:admin/aliases/get at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply0(OpenDistroSecurityFilter.java:274) ~[?:?] at com.amazon.opendistroforelasticsearch.security.filter.OpenDistroSecurityFilter.apply(OpenDistroSecurityFilter.java:119) ~[?:?] at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:89) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:80) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceivedDecorate(OpenDistroSecuritySSLRequestHandler.java:194) ~[?:?] at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityRequestHandler.messageReceivedDecorate(OpenDistroSecurityRequestHandler.java:284) ~[?:?] at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:166) ~[?:?] at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:646) ~[?:?] at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1087) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1046) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:932) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.redhat-5.jar:6.8.1.redhat-5] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) [netty-handler-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) [netty-codec-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?]
Moving this back to ON_QA as I am unable to verify #c35 which looks to be a completely unrelated issue. I did the following: * Deployed latest 4.6 CLO * Deployed latest 4.6 EO * Updated CL instance to be unmanaged * Edit the infra policy to poll every minute, rollover every 2m, delete every 4 min. Consistently see both the rollover and delete crons complete successfully. Inspecting the indices I see new ones being created and old removed.
The clusterlogging is running for 24 hours, and all the jobs completed successfully, move this bz to VERIFIED. I'll try to reproduce the issue in comment 35.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.1 extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4198
(In reply to Andre Costa from comment #46) > Hello, > > Can we re-open this ticket or have some information on the fix? > > We still have customers saying that on 4.5.z this is still happening. The fix made the scripts more robust and added more log statements. It may be a transient error. First, we keep 1 copy of failed pods. These pods are on a 15m cycle. If the ES cluster is not functional in that time then you will have failures which are transient. Please create a new ticket with a must-gather and all the details you can acquire so we can further investigate
BZ opened to backport 4.5 https://bugzilla.redhat.com/show_bug.cgi?id=1896578
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days