Description of problem: When upgrading from 3.3.1.20 to 3.4.1.18 the elasticsearch fails to initialize. Upon investigation it was identified that there were nearly 5000 indices which were not quick enough to get search pattern initialise. It may be because the storage was also slower to fetch probably a storage latency. Very often below messages are seen : [2017-05-09 13:45:53,562][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-05-09 13:45:55,111][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [2017-05-09 13:45:56,127][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized [....] Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Upgrade went fine however the EFk fails to initialize. Expected results: Upgrade should be successful and also the EFk stack should start successfully. Additional info:
Commit pushed to master at https://github.com/openshift/origin-aggregated-logging https://github.com/openshift/origin-aggregated-logging/commit/e125f746c81c5aeb6425e90246c62111b626c669 bug 1457642. Fix SG timeout We repeatedly call the sgadmin script until it successfully returns, sleeping 10 seconds between retries. Partial fix for BZ #1457642
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/2d9eeac5a20523e3574044bfbede1f0c0686c159 bug 1457642. Use same SG index to avoid seeding timeout
*** Bug 1464854 has been marked as a duplicate of this bug. ***
Tested with the latest v3.6 images on OCP 3.6.0, logging system worked fine and didn't meet this exception in es log. Set to verified. Test env: # openshift version openshift v3.6.131 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 ansible version: openshift-ansible-playbooks-3.6.131-1.git.0.d87dfaa.el7.noarch Images tested with: openshift3/logging-elasticsearch c601094a6111 openshift3/logging-kibana c91b7ad68dc7 openshift3/logging-fluentd 82367a1102e0 openshift3/logging-curator b609245a72f9 openshift3/logging-auth-proxy 39164e25543c
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716
Do we have this fixed in OCP 3.5 ?
Looks like we did not backport this fix to 3.5 [1] though I'm not certain why since it was fixed in 3.4 [2] [1] https://github.com/openshift/openshift-ansible/blob/release-1.5/roles/openshift_logging/templates/elasticsearch.yml.j2#L65 [2] https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/conf/elasticsearch.yml#L65 Please open an issue against 3.5 if we need to backport. I believe this would be a regression from 3.4