1274271 – [intservice_public_91]got NoShardAvailableActionException in es pod logs

Bug 1274271 - [intservice_public_91]got NoShardAvailableActionException in es pod logs

Summary: [intservice_public_91]got NoShardAvailableActionException in es pod logs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	ewolinet
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-22 11:59 UTC by wyue
Modified:	2020-03-11 14:58 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-19 14:35:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
es pod log (4.52 KB, text/plain) 2015-10-22 12:00 UTC, wyue	no flags	Details
View All

Description wyue 2015-10-22 11:59:37 UTC

Description of problem:
There is NoShardAvailableActionException in es pod logs when deploying EFK stack in ec2 instances

Version-Release number of selected component (if applicable):
oc v1.0.6-823-g23eaf25
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4

How reproducible:


Steps to Reproduce:
1.build images, then push them to a private github repo
with:https://github.com/openshift/origin-aggregated-logging/blob/master/hack/build-images.sh
2.update MASTER_URL in https://github.com/openshift/origin-aggregated-logging/blob/master/deployment/deployer.yaml
3.deploy EFK stack according to :
https://github.com/openshift/origin-aggregated-logging/tree/master/deployment
except for using below steps to run deployer:
oc process -f deployer.yaml -v IMAGE_PREFIX=wyue/,KIBANA_HOSTNAME=kibana.example.com,PUBLIC_MASTER_URL=https://ec2-54-158-187-217.compute-1.amazonaws.com:8443,ES_INSTANCE_RAM=1024M,ES_CLUSTER_SIZE=1 | oc create -f -

Actual results:
Got only one pod running with exception in pod logs(please see the attachment)
[root@ip-10-164-183-106 sample-app]# oc get pods
NAME                          READY     STATUS      RESTARTS   AGE
logging-deployer-t1ov9        0/1       Completed   0          5h
logging-es-he38fok0-1-k0fqs   1/1       Running     0          5h

Expected results:
no obvious error in pod logs


Additional info:

pod log is attached.

Comment 1 wyue 2015-10-22 12:00:36 UTC

Created attachment 1085479 [details]
es pod log

Comment 2 Luke Meyer 2015-10-22 12:58:45 UTC

We've seen this frequently at startup but it doesn't appear to cause any problems. It would be nice to work out the timing issue or whatever it is that causes this, or to suppress it otherwise.

Comment 3 Luke Meyer 2015-10-30 20:57:53 UTC

I would add that all of these exceptions are of the same general type - stuff that indicates everything isn't started up yet and you just need to wait:

io.fabric8.elasticsearch.discovery.k8s.K8sDiscovery
failed to connect to master, retrying...
org.elasticsearch.transport.ConnectTransportException

com.floragunn.searchguard.service.SearchGuardConfigService
Try to refresh security configuration but it failed due to org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]

io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter
Error checking ACL when seeding
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];

io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter
Exception encountered when seeding initial ACL
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]

Comment 7 ewolinet 2016-03-31 14:37:13 UTC

The occurrence of this stack trace is sourced from the Searchguard plugin when it queries ES for its settings while ES is not yet up and responding to queries.

Patch was merged upstream to suppress this message while ES is not yet available and will be pulled into an updated ES image.

The externally tracked issue is not related to this.

Comment 9 Kenjiro Nakayama 2017-01-18 01:23:20 UTC

What the "RELEASE_PENDING" status means? One of the customer hit this issue with "logging-elasticsearch:3.2.1", but it was not fixed this issue? If not, do you have a plan to release the fix for 3.2 elasticsearch image?

Comment 10 ewolinet 2017-01-19 14:35:50 UTC

Release pending meant that it was going to be fixed in an upcoming release of EFK (3.4).

There is not a plan to fix the 3.2 Elasticsearch image at this time since the source is from one of the plugins provided with it and the manner in which it looks for its settings.

With the version in the pre-3.4 ES image it polls every few seconds until it is able to read in its configuration upon starting. The version used with the 3.4 ES image is instead notified. Unfortunately we cannot just update the plugin on the pre-3.4 images since the versions of ES it is written for is different (with 3.4 we moved from ES 1.5.2 to ES 2.4.1).

Comment 11 Kenjiro Nakayama 2017-01-30 09:34:58 UTC

@ewolinet,
Could you plesae give us the link of PR which fixed this issue? If it was not only one commit, please give us some of them. It is strange that one user hit this some times, although the report has not filed from other users. I'm sorry for bothering you, but we would like to confirm the cause of this issue.

Note You need to log in before you can comment on or make changes to this bug.