Bug 1467572 - Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. [NEEDINFO]
Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.sea...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.5.0
Unspecified Unspecified
high Severity high
: ---
: 3.5.z
Assigned To: Jeff Cantrill
Xia Zhao
: OpsBlocker, Regression, TestBlocker
: 1466962 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-04 04:53 EDT by Xia Zhao
Modified: 2017-07-24 08:56 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Cause: The build file and install scripts were out Consequence: Fix: Result:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-11 01:11:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jlee: needinfo? (xiazhao)
jlee: needinfo? (xiazhao)


Attachments (Terms of Use)
es_log (19.05 KB, text/plain)
2017-07-04 04:53 EDT, Xia Zhao
no flags Details
inventory file for logging deployment (701 bytes, text/plain)
2017-07-04 04:55 EDT, Xia Zhao
no flags Details
output of the sgadmin commands (11.58 KB, text/plain)
2017-07-04 05:12 EDT, Xia Zhao
no flags Details

  None (edit)
Description Xia Zhao 2017-07-04 04:53:04 EDT
Created attachment 1294117 [details]
es_log

Description of problem:
Deploy logging 3.5.0 with the work around of bug #1466626 in https://github.com/openshift/openshift-ansible/pull/4657/files, es is encountering error:
# oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-6hm8z       0/1       CrashLoopBackOff   4          10m
logging-es-yn8sonty-1-285sb   1/1       Running            0          10m
logging-fluentd-l2gn8         1/1       Running            0          10m
logging-fluentd-l67v0         1/1       Running            0          10m
logging-kibana-1-jk01x        2/2       Running            0          10m

# oc logs logging-es-yn8sonty-1-285sb
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms4096m -Xmx4096m'
Checking if Elasticsearch is ready on https://localhost:9200 ........Will connect to localhost:9300 ... done
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. This is not an error, will keep on trying ...
   * Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates)
   * If this is not working, try running sgadmin.sh with --diagnose and see diagnose trace log file)
Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. This is not an error, will keep on trying ...

Remote into es pod, the following commands didn't really help:
./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh -icl
./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh -nhnv 
./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh --diagnose

Version-Release number of selected component (if applicable):
$stage_registry/openshift3/ose-logging-elasticsearch   v3.5                a7989e457354        4 days ago          399.6 MB
$stage_registry/openshift3/logging-elasticsearch       v3.5                a7989e457354        4 days ago          399.6 MB

ansible version: openshift-ansible-playbooks-3.5.91-1.git.0.28b3ddb.el7.noarch

# openshift version
openshift v3.5.5.31
kubernetes v1.5.2+43a9be4
etcd 3.1.0


How reproducible:
Always

Steps to Reproduce:
1.work around bug #1466626 on ansible control node: https://github.com/openshift/openshift-ansible/pull/4657/files
2.Deploy logging 3.5.0    
3.Check es pod's log

Actual results:
es not started up

Expected results:
es should start up

Additional info:
full es log attached
Comment 1 Xia Zhao 2017-07-04 04:55 EDT
Created attachment 1294119 [details]
inventory file for logging deployment
Comment 2 Xia Zhao 2017-07-04 05:12 EDT
Created attachment 1294124 [details]
output of the sgadmin commands
Comment 3 Xia Zhao 2017-07-04 06:10:32 EDT
logging tests on 3.5 are all blocked since es can't working fine.
Comment 4 Jan Wozniak 2017-07-04 10:42:50 EDT
it is possible to be the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1466962
Comment 5 Xia Zhao 2017-07-04 23:19:57 EDT
issue did not repro with  the current latest released image in public registry ,this is a regression. 

Images tested with:
${public_registry}/openshift3/logging-kibana                                  v3.5                58c8d604b327        2 weeks ago         342.6 MB
${public_registry}/openshift3/logging-fluentd                                 v3.5                7ade1647aad2        2 weeks ago         232.8 MB
${public_registry}/openshift3/logging-elasticsearch                           v3.5                3e25b6e17191        2 weeks ago         399.5 MB
${public_registry}/openshift3/logging-curator                                 v3.5                5222e4da1183        2 weeks ago         211.3 MB
${public_registry}/openshift3/logging-auth-proxy                              v3.5                2c7989093587        2 weeks ago         215.3 MB

--es can start up successfully with the above released images.
Comment 6 Jeff Cantrill 2017-07-05 07:51:09 EDT
*** Bug 1466962 has been marked as a duplicate of this bug. ***
Comment 8 ihorvath 2017-07-05 09:26:05 EDT
The current images available at registry.ops... are newer than the ones mentioned by Xia Zhao. Should we try to revert back to those?
Comment 9 Brenton Leanhardt 2017-07-05 17:06:50 EDT
Talking with Jeff on irc I don't think this bug is available for testing in OCP.  It seems like the affected images will need to be rebuilt.
Comment 10 Praveen Varma 2017-07-06 01:34:17 EDT
@Jeff &/ @Brenton - Can you share an update on this BZ as the Errata is waiting on this BZ. The original date of the Errata release was 29th June which was pushed to 5th July and looks like the date is postponed.

Can you please take a look and let us know when this BZ would be ready for the QA again?

Thanks,
Praveen
Escalation manager
Comment 13 Xia Zhao 2017-07-06 23:09:52 EDT
Verified with the latest images on brew registry, es can start up successfully and logging system worked fine. 

Images tested with:
${brew_registry}/openshift3/logging-curator         v3.5                b672db1aa426        3 hours ago         211.3 MB
${brew_registry}/openshift3/logging-elasticsearch   v3.5                fbd7deba485e        9 hours ago         398.6 MB
${brew_registry}/openshift3/logging-kibana          v3.5                277c4a616a5a        7 days ago          342.6 MB
${brew_registry}/openshift3/logging-fluentd         v3.5                c09565262cad        7 days ago          232.8 MB
${brew_registry}/openshift3/logging-auth-proxy      v3.5                d79212db0381        7 days ago          215.3 MB

# openshift version
openshift v3.5.5.31
kubernetes v1.5.2+43a9be4
etcd 3.1.0
Comment 14 errata-xmlrpc 2017-07-11 01:11:13 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1646
Comment 15 jooho lee 2017-07-21 15:45:20 EDT
Hi, 

I have exactly same issue. 

This is the image and ocp that I used:
registry.access.redhat.com/openshift3/logging-elasticsearch       v3.5                7724dd80f73d        13 days ago         398.6 MB
registry.access.redhat.com/openshift3/logging-kibana              v3.5                277c4a616a5a        3 weeks ago         342.6 MB
registry.access.redhat.com/openshift3/logging-fluentd             v3.5                c09565262cad        3 weeks ago         232.8 MB
registry.access.redhat.com/openshift3/logging-curator             v3.5                0aa259fbc36e        3 weeks ago         211.3 MB
registry.access.redhat.com/openshift3/logging-auth-proxy          v3.5                d79212db0381        3 weeks ago         215.3 MB


oc v3.5.5.26
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

The image id of elasticsearch "7724dd80f73d" and curator "0aa259fbc36e" is different from your image. Those images contains the fix version?

Should I upgrade ocp to fix it?

Thanks,
Jooho Lee.
Comment 16 Jeff Cantrill 2017-07-23 14:05:29 EDT
Joohoo,

The advisory which contains various 3.5 fixes [1] says it was targeted for release 2017-Jul-17, but I am unable to comment further if it has been pushed out.  Among the items to be resolved:

* bug 1420217. Update ES plugin that squashes stack on start
* bug 1457642. Fix SG timeout
* use SG index defined in ES config

[1] https://errata.devel.redhat.com/advisory/28089
Comment 17 jooho lee 2017-07-24 08:56:38 EDT
Thanks Jeff,

Does "xia zhao" have any comment on this?

Note You need to log in before you can comment on or make changes to this bug.