Bug 1274271 - [intservice_public_91]got NoShardAvailableActionException in es pod logs
[intservice_public_91]got NoShardAvailableActionException in es pod logs
Product: OpenShift Origin
Classification: Red Hat
Component: Logging (Show other bugs)
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: ewolinet
Xia Zhao
Depends On:
  Show dependency treegraph
Reported: 2015-10-22 07:59 EDT by wyue
Modified: 2017-02-20 09:31 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-01-19 09:35:50 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
es pod log (4.52 KB, text/plain)
2015-10-22 08:00 EDT, wyue
no flags Details

  None (edit)
Description wyue 2015-10-22 07:59:37 EDT
Description of problem:
There is NoShardAvailableActionException in es pod logs when deploying EFK stack in ec2 instances

Version-Release number of selected component (if applicable):
oc v1.0.6-823-g23eaf25
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4

How reproducible:

Steps to Reproduce:
1.build images, then push them to a private github repo
2.update MASTER_URL in https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/deployer.yaml
3.deploy EFK stack according to :
except for using below steps to run deployer:
oc process -f deployer.yaml -v IMAGE_PREFIX=wyue/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://ec2-54-158-187-217.compute-1.amazonaws.com:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f -

Actual results:
Got only one pod running with exception in pod logs(please see the attachment)
[root@ip-10-164-183-106 sample-app]# oc get pods
NAME                          READY     STATUS      RESTARTS   AGE
logging-deployer-t1ov9        0/1       Completed   0          5h
logging-es-he38fok0-1-k0fqs   1/1       Running     0          5h

Expected results:
no obvious error in pod logs

Additional info:

pod log is attached.
Comment 1 wyue 2015-10-22 08:00 EDT
Created attachment 1085479 [details]
es pod log
Comment 2 Luke Meyer 2015-10-22 08:58:45 EDT
We've seen this frequently at startup but it doesn't appear to cause any problems. It would be nice to work out the timing issue or whatever it is that causes this, or to suppress it otherwise.
Comment 3 Luke Meyer 2015-10-30 16:57:53 EDT
I would add that all of these exceptions are of the same general type - stuff that indicates everything isn't started up yet and you just need to wait:

failed to connect to master, retrying...

Try to refresh security configuration but it failed due to org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]

Error checking ACL when seeding
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];

Exception encountered when seeding initial ACL
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]
Comment 7 ewolinet 2016-03-31 10:37:13 EDT
The occurrence of this stack trace is sourced from the Searchguard plugin when it queries ES for its settings while ES is not yet up and responding to queries.

Patch was merged upstream to suppress this message while ES is not yet available and will be pulled into an updated ES image.

The externally tracked issue is not related to this.
Comment 9 Kenjiro Nakayama 2017-01-17 20:23:20 EST
What the "RELEASE_PENDING" status means? One of the customer hit this issue with "logging-elasticsearch:3.2.1", but it was not fixed this issue? If not, do you have a plan to release the fix for 3.2 elasticsearch image?
Comment 10 ewolinet 2017-01-19 09:35:50 EST
Release pending meant that it was going to be fixed in an upcoming release of EFK (3.4).

There is not a plan to fix the 3.2 Elasticsearch image at this time since the source is from one of the plugins provided with it and the manner in which it looks for its settings.

With the version in the pre-3.4 ES image it polls every few seconds until it is able to read in its configuration upon starting. The version used with the 3.4 ES image is instead notified. Unfortunately we cannot just update the plugin on the pre-3.4 images since the versions of ES it is written for is different (with 3.4 we moved from ES 1.5.2 to ES 2.4.1).
Comment 11 Kenjiro Nakayama 2017-01-30 04:34:58 EST
Could you plesae give us the link of PR which fixed this issue? If it was not only one commit, please give us some of them. It is strange that one user hit this some times, although the report has not filed from other users. I'm sorry for bothering you, but we would like to confirm the cause of this issue.

Note You need to log in before you can comment on or make changes to this bug.