Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1493820

Summary:	[3.5] Elastic search pod fails start and gives error "ERR: Timed out while waiting for a green or yellow cluster state."
Product:	OpenShift Container Platform	Reporter:	Miheer Salunke <misalunk>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.5.0	CC:	aos-bugs, bmcelvee, jcantril, misalunk, nhosoi, pportant, rmeggins, smunilla
Target Milestone:	---
Target Release:	3.5.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Elasticsearch had a timing issue trying to seed its ACL index. This caused Elasticsearch to have difficulty starting and did not allow traffic because the ACLs were not properly seeded. This bux fix uses the `DC_NAME` instead of the pod name, resulting in SearchGuard more reliably allowing traffic to flow because ACLs are seeded.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-11-21 05:41:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Miheer Salunke 2017-09-21 01:32:59 UTC

Description of problem:

Elastic search pod fails start and gives following error.


ERR: Timed out while waiting for a green or yellow cluster state.
   * Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates)
2
/usr/share/java/elasticsearch/config
Will connect to localhost:9300 ... done

Attachment added for information on pods and logs.

Version-Release number of selected component (if applicable):
OCP 3.5

How reproducible:
Always on customer side

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Miheer Salunke 2017-09-21 01:33:36 UTC

Openshift runs of top of AWS

openshift v3.5.5.31
kubernetes v1.5.2+43a9be4

EFS is configured for EFK PV

Comment 2 Miheer Salunke 2017-09-21 01:35:28 UTC

manually starting sgadmin.sh also didn't help.

Details attached.

Comment 6 Jeff Cantrill 2017-09-21 17:55:26 UTC

Miheer,

Is this the issue you pinged us on IRC about the 3.4 change that did not make it into 3.5 regarding replacing $HOSTNAME with $DC_NAME?  I dont see the configmap in the attachment.  Have you considered using? https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh to gather log info

Comment 7 Miheer Salunke 2017-09-22 03:30:53 UTC

(In reply to Jeff Cantrill from comment #6)
> Miheer,
> 
> Is this the issue you pinged us on IRC about the 3.4 change that did not
> make it into 3.5 regarding replacing $HOSTNAME with $DC_NAME?  I dont see
> the configmap in the attachment.  Have you considered using?
> https://github.com/openshift/origin-aggregated-logging/blob/master/hack/
> logging-dump.sh to gather log info

No this seems to be a different issue. I don't recall about the issue which you mentioned.

Do you need configmap? and output of the script ?

Comment 8 Jeff Cantrill 2017-09-22 14:48:28 UTC

Output from the script would be useful as it includes the configmaps among other things for us to better diagnose.  The information you have provided is insufficient for us to properly understand what is happening with the cluster

Comment 12 Anping Li 2017-11-06 09:24:33 UTC

@Samuel, The fix is only in openshift-ansible-3.5.139.  Could you move this bug to a installer errata?

Comment 15 Junqi Zhao 2017-11-10 12:50:31 UTC

Tested, ES state is Green, no error threw out.
# oc exec ${ES_POD} -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices
green open .searchguard.logging-es-n2iwege1                                     1 0      5 0  27.6kb  27.6kb 
green open .kibana                                                              1 0      1 0   3.1kb   3.1kb 
green open project.install-test.db8c4e9a-c60d-11e7-a33d-fa163ef17798.2017.11.10 1 0    662 0 251.7kb 251.7kb 
green open .kibana.ef0b7ff169fdc9202e567ce53aa5e17320cb2d7d                     1 0      6 3  37.2kb  37.2kb 
green open .operations.2017.11.10                                               1 0 116917 0  46.2mb  46.2mb 
green open project.logging.73effd17-c60d-11e7-a33d-fa163ef17798.2017.11.10      1 0    263 0 213.5kb 213.5kb 
green open project.java.3d808ece-c60f-11e7-a33d-fa163ef17798.2017.11.10         1 0   1485 0   510kb   510kb 

# openshift version
openshift v3.5.5.31.47
kubernetes v1.5.2+43a9be4
etcd 3.1.0


# openshift version
openshift v3.5.5.31.47
kubernetes v1.5.2+43a9be4
etcd 3.1.0

images:
logging-curator/images/v3.5.5.31.47-1
logging-elasticsearch/images/3.5.0-48
logging-kibana/images/3.5.0-44
logging-fluentd/images/3.5.0-39
logging-auth-proxy/images/3.5.0-38

There is one kibana error when verifying this defect, but it's not related to ES
https://bugzilla.redhat.com/show_bug.cgi?id=1511925

Comment 18 errata-xmlrpc 2017-11-21 05:41:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3255

Comment 19 Red Hat Bugzilla 2023-09-14 04:08:25 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days