Created attachment 1294117 [details] es_log Description of problem: Deploy logging 3.5.0 with the work around of bug #1466626 in https://github.com/openshift/openshift-ansible/pull/4657/files, es is encountering error: # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-6hm8z 0/1 CrashLoopBackOff 4 10m logging-es-yn8sonty-1-285sb 1/1 Running 0 10m logging-fluentd-l2gn8 1/1 Running 0 10m logging-fluentd-l67v0 1/1 Running 0 10m logging-kibana-1-jk01x 2/2 Running 0 10m # oc logs logging-es-yn8sonty-1-285sb Comparing the specificed RAM to the maximum recommended for ElasticSearch... Inspecting the maximum RAM available... ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms4096m -Xmx4096m' Checking if Elasticsearch is ready on https://localhost:9200 ........Will connect to localhost:9300 ... done Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ... Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. This is not an error, will keep on trying ... * Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates) * If this is not working, try running sgadmin.sh with --diagnose and see diagnose trace log file) Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. This is not an error, will keep on trying ... Remote into es pod, the following commands didn't really help: ./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh -icl ./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh -nhnv ./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh --diagnose Version-Release number of selected component (if applicable): $stage_registry/openshift3/ose-logging-elasticsearch v3.5 a7989e457354 4 days ago 399.6 MB $stage_registry/openshift3/logging-elasticsearch v3.5 a7989e457354 4 days ago 399.6 MB ansible version: openshift-ansible-playbooks-3.5.91-1.git.0.28b3ddb.el7.noarch # openshift version openshift v3.5.5.31 kubernetes v1.5.2+43a9be4 etcd 3.1.0 How reproducible: Always Steps to Reproduce: 1.work around bug #1466626 on ansible control node: https://github.com/openshift/openshift-ansible/pull/4657/files 2.Deploy logging 3.5.0 3.Check es pod's log Actual results: es not started up Expected results: es should start up Additional info: full es log attached
Created attachment 1294119 [details] inventory file for logging deployment
Created attachment 1294124 [details] output of the sgadmin commands
logging tests on 3.5 are all blocked since es can't working fine.
it is possible to be the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1466962
issue did not repro with the current latest released image in public registry ,this is a regression. Images tested with: ${public_registry}/openshift3/logging-kibana v3.5 58c8d604b327 2 weeks ago 342.6 MB ${public_registry}/openshift3/logging-fluentd v3.5 7ade1647aad2 2 weeks ago 232.8 MB ${public_registry}/openshift3/logging-elasticsearch v3.5 3e25b6e17191 2 weeks ago 399.5 MB ${public_registry}/openshift3/logging-curator v3.5 5222e4da1183 2 weeks ago 211.3 MB ${public_registry}/openshift3/logging-auth-proxy v3.5 2c7989093587 2 weeks ago 215.3 MB --es can start up successfully with the above released images.
*** Bug 1466962 has been marked as a duplicate of this bug. ***
The current images available at registry.ops... are newer than the ones mentioned by Xia Zhao. Should we try to revert back to those?
Talking with Jeff on irc I don't think this bug is available for testing in OCP. It seems like the affected images will need to be rebuilt.
@Jeff &/ @Brenton - Can you share an update on this BZ as the Errata is waiting on this BZ. The original date of the Errata release was 29th June which was pushed to 5th July and looks like the date is postponed. Can you please take a look and let us know when this BZ would be ready for the QA again? Thanks, Praveen Escalation manager
Verified with the latest images on brew registry, es can start up successfully and logging system worked fine. Images tested with: ${brew_registry}/openshift3/logging-curator v3.5 b672db1aa426 3 hours ago 211.3 MB ${brew_registry}/openshift3/logging-elasticsearch v3.5 fbd7deba485e 9 hours ago 398.6 MB ${brew_registry}/openshift3/logging-kibana v3.5 277c4a616a5a 7 days ago 342.6 MB ${brew_registry}/openshift3/logging-fluentd v3.5 c09565262cad 7 days ago 232.8 MB ${brew_registry}/openshift3/logging-auth-proxy v3.5 d79212db0381 7 days ago 215.3 MB # openshift version openshift v3.5.5.31 kubernetes v1.5.2+43a9be4 etcd 3.1.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1646
Hi, I have exactly same issue. This is the image and ocp that I used: registry.access.redhat.com/openshift3/logging-elasticsearch v3.5 7724dd80f73d 13 days ago 398.6 MB registry.access.redhat.com/openshift3/logging-kibana v3.5 277c4a616a5a 3 weeks ago 342.6 MB registry.access.redhat.com/openshift3/logging-fluentd v3.5 c09565262cad 3 weeks ago 232.8 MB registry.access.redhat.com/openshift3/logging-curator v3.5 0aa259fbc36e 3 weeks ago 211.3 MB registry.access.redhat.com/openshift3/logging-auth-proxy v3.5 d79212db0381 3 weeks ago 215.3 MB oc v3.5.5.26 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO The image id of elasticsearch "7724dd80f73d" and curator "0aa259fbc36e" is different from your image. Those images contains the fix version? Should I upgrade ocp to fix it? Thanks, Jooho Lee.
Joohoo, The advisory which contains various 3.5 fixes [1] says it was targeted for release 2017-Jul-17, but I am unable to comment further if it has been pushed out. Among the items to be resolved: * bug 1420217. Update ES plugin that squashes stack on start * bug 1457642. Fix SG timeout * use SG index defined in ES config [1] https://errata.devel.redhat.com/advisory/28089
Thanks Jeff, Does "xia zhao" have any comment on this?
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days