Bug 1371200

Summary: NoShardAvailableActionException Error is provided even when the logging cluster hasn't properly started
Product: OpenShift Container Platform Reporter: Eric Jones <erjones>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Xia Zhao <xiazhao>
Severity: low Docs Contact:
Priority: low    
Version: 3.2.1CC: aos-bugs, ewolinet, jcantril, penli, smunilla
Target Milestone: ---   
Target Release: 3.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Decrease the amount of noise in the ES logs as it starts up due to the cluster not yet being available when trying to seed its initial ACL. Reason: To improve user experience when checking ES logs. Result: ES no longer throws a stack trace unnecessarily when trying to seed its ACL as the cluster is starting up.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 09:20:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eric Jones 2016-08-29 14:36:43 UTC
Description of problem:
If ElasticSearch, FluentD, and Kibana are all scaled up (and sometimes just ES) at the same time, the ElasticSearch pod will commonly provide the NoShardAvailableActionException error message in the logs even when it shouldn't be.

Version-Release number of selected component (if applicable):
3.2.1.4

Comment 4 Xia Zhao 2016-12-19 06:04:50 UTC
See this exception with the latest 3.4.0 images when fluentd isn't up due to https://bugzilla.redhat.com/show_bug.cgi?id=1405306:

$ oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-xdykf       1/1       Running            0          17m
logging-deployer-suu74        0/1       Completed          0          18m
logging-es-br17ygwu-1-748hc   1/1       Running            0          17m
logging-fluentd-ktoss         0/1       CrashLoopBackOff   7          17m
logging-kibana-1-2fevi        2/2       Running            0          17m

$ oc logs logging-fluentd-ktoss    (the issue described by bug #1405306)
...
panic: standard_init_linux.go:175: exec user process caused "permission denied" [recovered]
    panic: standard_init_linux.go:175: exec user process caused "permission denied"
goroutine 1 [running, locked to thread]:
panic(0x6f2ea0, 0xc42016b810)
...

$ oc logs logging-es-br17ygwu-1-748hc 
...
[2016-12-19 05:32:37,687][ERROR][io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter] [Aginar] Error checking ACL when seeding
NoShardAvailableActionException[No shard available for [get [.searchguard.logging-es-qwnd0r0b-1-139iu][roles][0]: routing [null]]]; nested: RemoteTransportException[[Aginar][10.129.0.37:9300][indices:data/read/get[s]]]; nested: ShardNotFoundException[no such shard];
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:199)
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.onFailure(TransportSingleShardAction.java:186)
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.access$1300(TransportSingleShardAction.java:115)
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleException(TransportSingleShardAction.java:240)
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:872)
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:850)
	at org.elasticsearch.transport.TransportService$4.onFailure(TransportService.java:387)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: RemoteTransportException[[Aginar][10.129.0.37:9300][indices:data/read/get[s]]]; nested: ShardNotFoundException[no such shard];
Caused by: [.searchguard.logging-es-qwnd0r0b-1-139iu][[.searchguard.logging-es-qwnd0r0b-1-139iu][0]] ShardNotFoundException[no such shard]
	at org.elasticsearch.index.IndexService.shardSafe(IndexService.java:197)
	at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:95)
	at org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:44)
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$ShardTransportHandler.messageReceived(TransportSingleShardAction.java:282)
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$ShardTransportHandler.messageReceived(TransportSingleShardAction.java:275)
	at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
	at com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService.messageReceivedDecorate(SearchGuardSSLTransportService.java:171)
	at com.floragunn.searchguard.transport.SearchGuardTransportService.messageReceivedDecorate(SearchGuardTransportService.java:190)
	at com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor.messageReceived(SearchGuardSSLTransportService.java:110)
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
	at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)





After workaround bug #1405306 and get fluentd up, the NoShardAvailableActionException did not exist anymore in ES log.

Images tested with:

ops registry:
openshift3/logging-deployer    755d30b7d4de
openshift3/logging-kibana    d5971557d356
openshift3/logging-fluentd    7b11a29c82c1
openshift3/logging-elasticsearch    6716a0ad8b2b
openshift3/logging-auth-proxy    ec334b0c2669
openshift3/logging-curator    9af78fc06248

Comment 5 Jeff Cantrill 2016-12-19 15:28:04 UTC
Added PR https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/55 to mute the stack in cases where the shard is not available which should fundamentally be the same as when the index is not available.

Comment 6 ewolinet 2017-03-16 17:13:48 UTC
koji_builds = 543826
repositories = brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:rhaos-3.4-rhel-7-docker-candidate-20170313132630, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:3.4.1-10, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:3.4.1, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:latest, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:v3.4

Comment 10 errata-xmlrpc 2017-03-23 09:20:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0835