Hide Forgot
Description of problem: If ElasticSearch, FluentD, and Kibana are all scaled up (and sometimes just ES) at the same time, the ElasticSearch pod will commonly provide the NoShardAvailableActionException error message in the logs even when it shouldn't be. Version-Release number of selected component (if applicable): 3.2.1.4
See this exception with the latest 3.4.0 images when fluentd isn't up due to https://bugzilla.redhat.com/show_bug.cgi?id=1405306: $ oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-xdykf 1/1 Running 0 17m logging-deployer-suu74 0/1 Completed 0 18m logging-es-br17ygwu-1-748hc 1/1 Running 0 17m logging-fluentd-ktoss 0/1 CrashLoopBackOff 7 17m logging-kibana-1-2fevi 2/2 Running 0 17m $ oc logs logging-fluentd-ktoss (the issue described by bug #1405306) ... panic: standard_init_linux.go:175: exec user process caused "permission denied" [recovered] panic: standard_init_linux.go:175: exec user process caused "permission denied" goroutine 1 [running, locked to thread]: panic(0x6f2ea0, 0xc42016b810) ... $ oc logs logging-es-br17ygwu-1-748hc ... [2016-12-19 05:32:37,687][ERROR][io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter] [Aginar] Error checking ACL when seeding NoShardAvailableActionException[No shard available for [get [.searchguard.logging-es-qwnd0r0b-1-139iu][roles][0]: routing [null]]]; nested: RemoteTransportException[[Aginar][10.129.0.37:9300][indices:data/read/get[s]]]; nested: ShardNotFoundException[no such shard]; at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:199) at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.onFailure(TransportSingleShardAction.java:186) at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.access$1300(TransportSingleShardAction.java:115) at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleException(TransportSingleShardAction.java:240) at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:872) at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:850) at org.elasticsearch.transport.TransportService$4.onFailure(TransportService.java:387) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: RemoteTransportException[[Aginar][10.129.0.37:9300][indices:data/read/get[s]]]; nested: ShardNotFoundException[no such shard]; Caused by: [.searchguard.logging-es-qwnd0r0b-1-139iu][[.searchguard.logging-es-qwnd0r0b-1-139iu][0]] ShardNotFoundException[no such shard] at org.elasticsearch.index.IndexService.shardSafe(IndexService.java:197) at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:95) at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:44) at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$ShardTransportHandler.messageReceived(TransportSingleShardAction.java:282) at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$ShardTransportHandler.messageReceived(TransportSingleShardAction.java:275) at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) at com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService.messageReceivedDecorate(SearchGuardSSLTransportService.java:171) at com.floragunn.searchguard.transport.SearchGuardTransportService.messageReceivedDecorate(SearchGuardTransportService.java:190) at com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor.messageReceived(SearchGuardSSLTransportService.java:110) at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77) at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) After workaround bug #1405306 and get fluentd up, the NoShardAvailableActionException did not exist anymore in ES log. Images tested with: ops registry: openshift3/logging-deployer 755d30b7d4de openshift3/logging-kibana d5971557d356 openshift3/logging-fluentd 7b11a29c82c1 openshift3/logging-elasticsearch 6716a0ad8b2b openshift3/logging-auth-proxy ec334b0c2669 openshift3/logging-curator 9af78fc06248
Added PR https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/55 to mute the stack in cases where the shard is not available which should fundamentally be the same as when the index is not available.
koji_builds = 543826 repositories = brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:rhaos-3.4-rhel-7-docker-candidate-20170313132630, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:3.4.1-10, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:3.4.1, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:latest, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:v3.4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0835