Description of problem: There is NoShardAvailableActionException in es pod logs when deploying EFK stack in ec2 instances Version-Release number of selected component (if applicable): oc v1.0.6-823-g23eaf25 kubernetes v1.2.0-alpha.1-1107-g4c8e6f4 How reproducible: Steps to Reproduce: 1.build images, then push them to a private github repo with:https://github.com/openshift/origin-aggregated-logging/blob/master/hack/build-images.sh 2.update MASTER_URL in https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/deployer.yaml 3.deploy EFK stack according to : https://github.com/openshift/origin-aggregated-logging/tree/master/deployment except for using below steps to run deployer: oc process -f deployer.yaml -v IMAGE_PREFIX=wyue/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://ec2-54-158-187-217.compute-1.amazonaws.com:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f - Actual results: Got only one pod running with exception in pod logs(please see the attachment) [root@ip-10-164-183-106 sample-app]# oc get pods NAME READY STATUS RESTARTS AGE logging-deployer-t1ov9 0/1 Completed 0 5h logging-es-he38fok0-1-k0fqs 1/1 Running 0 5h Expected results: no obvious error in pod logs Additional info: pod log is attached.
Created attachment 1085479 [details] es pod log
We've seen this frequently at startup but it doesn't appear to cause any problems. It would be nice to work out the timing issue or whatever it is that causes this, or to suppress it otherwise.
I would add that all of these exceptions are of the same general type - stuff that indicates everything isn't started up yet and you just need to wait: io.fabric8.elasticsearch.discovery.k8s.K8sDiscovery failed to connect to master, retrying... org.elasticsearch.transport.ConnectTransportException com.floragunn.searchguard.service.SearchGuardConfigService Try to refresh security configuration but it failed due to org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized] io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter Error checking ACL when seeding org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter Exception encountered when seeding initial ACL org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]
The occurrence of this stack trace is sourced from the Searchguard plugin when it queries ES for its settings while ES is not yet up and responding to queries. Patch was merged upstream to suppress this message while ES is not yet available and will be pulled into an updated ES image. The externally tracked issue is not related to this.
What the "RELEASE_PENDING" status means? One of the customer hit this issue with "logging-elasticsearch:3.2.1", but it was not fixed this issue? If not, do you have a plan to release the fix for 3.2 elasticsearch image?
Release pending meant that it was going to be fixed in an upcoming release of EFK (3.4). There is not a plan to fix the 3.2 Elasticsearch image at this time since the source is from one of the plugins provided with it and the manner in which it looks for its settings. With the version in the pre-3.4 ES image it polls every few seconds until it is able to read in its configuration upon starting. The version used with the 3.4 ES image is instead notified. Unfortunately we cannot just update the plugin on the pre-3.4 images since the versions of ES it is written for is different (with 3.4 we moved from ES 1.5.2 to ES 2.4.1).
@ewolinet, Could you plesae give us the link of PR which fixed this issue? If it was not only one commit, please give us some of them. It is strange that one user hit this some times, although the report has not filed from other users. I'm sorry for bothering you, but we would like to confirm the cause of this issue.