Bug 1493820
| Summary: | [3.5] Elastic search pod fails start and gives error "ERR: Timed out while waiting for a green or yellow cluster state." | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Miheer Salunke <misalunk> |
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> |
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.5.0 | CC: | aos-bugs, bmcelvee, jcantril, misalunk, nhosoi, pportant, rmeggins, smunilla |
| Target Milestone: | --- | ||
| Target Release: | 3.5.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Elasticsearch had a timing issue trying to seed its ACL index. This caused Elasticsearch to have difficulty starting and did not allow traffic because the ACLs were not properly seeded. This bux fix uses the `DC_NAME` instead of the pod name, resulting in SearchGuard more reliably allowing traffic to flow because ACLs are seeded.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-21 05:41:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Miheer Salunke
2017-09-21 01:32:59 UTC
Openshift runs of top of AWS openshift v3.5.5.31 kubernetes v1.5.2+43a9be4 EFS is configured for EFK PV manually starting sgadmin.sh also didn't help. Details attached. Miheer, Is this the issue you pinged us on IRC about the 3.4 change that did not make it into 3.5 regarding replacing $HOSTNAME with $DC_NAME? I dont see the configmap in the attachment. Have you considered using? https://github.com/openshift/origin-aggregated-logging/blob/master/hack/logging-dump.sh to gather log info (In reply to Jeff Cantrill from comment #6) > Miheer, > > Is this the issue you pinged us on IRC about the 3.4 change that did not > make it into 3.5 regarding replacing $HOSTNAME with $DC_NAME? I dont see > the configmap in the attachment. Have you considered using? > https://github.com/openshift/origin-aggregated-logging/blob/master/hack/ > logging-dump.sh to gather log info No this seems to be a different issue. I don't recall about the issue which you mentioned. Do you need configmap? and output of the script ? Output from the script would be useful as it includes the configmaps among other things for us to better diagnose. The information you have provided is insufficient for us to properly understand what is happening with the cluster @Samuel, The fix is only in openshift-ansible-3.5.139. Could you move this bug to a installer errata? Tested, ES state is Green, no error threw out.
# oc exec ${ES_POD} -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices
green open .searchguard.logging-es-n2iwege1 1 0 5 0 27.6kb 27.6kb
green open .kibana 1 0 1 0 3.1kb 3.1kb
green open project.install-test.db8c4e9a-c60d-11e7-a33d-fa163ef17798.2017.11.10 1 0 662 0 251.7kb 251.7kb
green open .kibana.ef0b7ff169fdc9202e567ce53aa5e17320cb2d7d 1 0 6 3 37.2kb 37.2kb
green open .operations.2017.11.10 1 0 116917 0 46.2mb 46.2mb
green open project.logging.73effd17-c60d-11e7-a33d-fa163ef17798.2017.11.10 1 0 263 0 213.5kb 213.5kb
green open project.java.3d808ece-c60f-11e7-a33d-fa163ef17798.2017.11.10 1 0 1485 0 510kb 510kb
# openshift version
openshift v3.5.5.31.47
kubernetes v1.5.2+43a9be4
etcd 3.1.0
# openshift version
openshift v3.5.5.31.47
kubernetes v1.5.2+43a9be4
etcd 3.1.0
images:
logging-curator/images/v3.5.5.31.47-1
logging-elasticsearch/images/3.5.0-48
logging-kibana/images/3.5.0-44
logging-fluentd/images/3.5.0-39
logging-auth-proxy/images/3.5.0-38
There is one kibana error when verifying this defect, but it's not related to ES
https://bugzilla.redhat.com/show_bug.cgi?id=1511925
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3255 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |