Red Hat Bugzilla – Bug 1274271
[intservice_public_91]got NoShardAvailableActionException in es pod logs
Last modified: 2017-02-20 09:31:13 EST
Description of problem:
There is NoShardAvailableActionException in es pod logs when deploying EFK stack in ec2 instances
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.build images, then push them to a private github repo
2.update MASTER_URL in https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/deployer.yaml
3.deploy EFK stack according to :
except for using below steps to run deployer:
oc process -f deployer.yaml -v IMAGE_PREFIX=wyue/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://ec2-54-158-187-217.compute-1.amazonaws.com:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f -
Got only one pod running with exception in pod logs(please see the attachment)
[root@ip-10-164-183-106 sample-app]# oc get pods
NAME READY STATUS RESTARTS AGE
logging-deployer-t1ov9 0/1 Completed 0 5h
logging-es-he38fok0-1-k0fqs 1/1 Running 0 5h
no obvious error in pod logs
pod log is attached.
Created attachment 1085479 [details]
es pod log
We've seen this frequently at startup but it doesn't appear to cause any problems. It would be nice to work out the timing issue or whatever it is that causes this, or to suppress it otherwise.
I would add that all of these exceptions are of the same general type - stuff that indicates everything isn't started up yet and you just need to wait:
failed to connect to master, retrying...
Try to refresh security configuration but it failed due to org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]
Error checking ACL when seeding
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
Exception encountered when seeding initial ACL
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]
The occurrence of this stack trace is sourced from the Searchguard plugin when it queries ES for its settings while ES is not yet up and responding to queries.
Patch was merged upstream to suppress this message while ES is not yet available and will be pulled into an updated ES image.
The externally tracked issue is not related to this.
What the "RELEASE_PENDING" status means? One of the customer hit this issue with "logging-elasticsearch:3.2.1", but it was not fixed this issue? If not, do you have a plan to release the fix for 3.2 elasticsearch image?
Release pending meant that it was going to be fixed in an upcoming release of EFK (3.4).
There is not a plan to fix the 3.2 Elasticsearch image at this time since the source is from one of the plugins provided with it and the manner in which it looks for its settings.
With the version in the pre-3.4 ES image it polls every few seconds until it is able to read in its configuration upon starting. The version used with the 3.4 ES image is instead notified. Unfortunately we cannot just update the plugin on the pre-3.4 images since the versions of ES it is written for is different (with 3.4 we moved from ES 1.5.2 to ES 2.4.1).
Could you plesae give us the link of PR which fixed this issue? If it was not only one commit, please give us some of them. It is strange that one user hit this some times, although the report has not filed from other users. I'm sorry for bothering you, but we would like to confirm the cause of this issue.