Bug 1274271

Summary: [intservice_public_91]got NoShardAvailableActionException in es pod logs
Product: OKD Reporter: wyue
Component: LoggingAssignee: ewolinet
Status: CLOSED CURRENTRELEASE QA Contact: Xia Zhao <xiazhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.xCC: aos-bugs, erich, ewolinet, knakayam, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-19 14:35:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
es pod log none

Description wyue 2015-10-22 11:59:37 UTC
Description of problem:
There is NoShardAvailableActionException in es pod logs when deploying EFK stack in ec2 instances

Version-Release number of selected component (if applicable):
oc v1.0.6-823-g23eaf25
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4

How reproducible:


Steps to Reproduce:
1.build images, then push them to a private github repo
with:https://github.com/openshift/origin-aggregated-logging/blob/master/hack/build-images.sh
2.update MASTER_URL in https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/deployer.yaml
3.deploy EFK stack according to :
https://github.com/openshift/origin-aggregated-logging/tree/master/deployment
except for using below steps to run deployer:
oc process -f deployer.yaml -v IMAGE_PREFIX=wyue/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://ec2-54-158-187-217.compute-1.amazonaws.com:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f -

Actual results:
Got only one pod running with exception in pod logs(please see the attachment)
[root@ip-10-164-183-106 sample-app]# oc get pods
NAME                          READY     STATUS      RESTARTS   AGE
logging-deployer-t1ov9        0/1       Completed   0          5h
logging-es-he38fok0-1-k0fqs   1/1       Running     0          5h

Expected results:
no obvious error in pod logs


Additional info:

pod log is attached.

Comment 1 wyue 2015-10-22 12:00:36 UTC
Created attachment 1085479 [details]
es pod log

Comment 2 Luke Meyer 2015-10-22 12:58:45 UTC
We've seen this frequently at startup but it doesn't appear to cause any problems. It would be nice to work out the timing issue or whatever it is that causes this, or to suppress it otherwise.

Comment 3 Luke Meyer 2015-10-30 20:57:53 UTC
I would add that all of these exceptions are of the same general type - stuff that indicates everything isn't started up yet and you just need to wait:

io.fabric8.elasticsearch.discovery.k8s.K8sDiscovery
failed to connect to master, retrying...
org.elasticsearch.transport.ConnectTransportException

com.floragunn.searchguard.service.SearchGuardConfigService
Try to refresh security configuration but it failed due to org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]

io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter
Error checking ACL when seeding
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];

io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter
Exception encountered when seeding initial ACL
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]

Comment 7 ewolinet 2016-03-31 14:37:13 UTC
The occurrence of this stack trace is sourced from the Searchguard plugin when it queries ES for its settings while ES is not yet up and responding to queries.

Patch was merged upstream to suppress this message while ES is not yet available and will be pulled into an updated ES image.

The externally tracked issue is not related to this.

Comment 9 Kenjiro Nakayama 2017-01-18 01:23:20 UTC
What the "RELEASE_PENDING" status means? One of the customer hit this issue with "logging-elasticsearch:3.2.1", but it was not fixed this issue? If not, do you have a plan to release the fix for 3.2 elasticsearch image?

Comment 10 ewolinet 2017-01-19 14:35:50 UTC
Release pending meant that it was going to be fixed in an upcoming release of EFK (3.4).

There is not a plan to fix the 3.2 Elasticsearch image at this time since the source is from one of the plugins provided with it and the manner in which it looks for its settings.

With the version in the pre-3.4 ES image it polls every few seconds until it is able to read in its configuration upon starting. The version used with the 3.4 ES image is instead notified. Unfortunately we cannot just update the plugin on the pre-3.4 images since the versions of ES it is written for is different (with 3.4 we moved from ES 1.5.2 to ES 2.4.1).

Comment 11 Kenjiro Nakayama 2017-01-30 09:34:58 UTC
@ewolinet,
Could you plesae give us the link of PR which fixed this issue? If it was not only one commit, please give us some of them. It is strange that one user hit this some times, although the report has not filed from other users. I'm sorry for bothering you, but we would like to confirm the cause of this issue.