1467572 – Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User.

Bug 1467572 - Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User.

Summary: Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.sea...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.5.z
Assignee:	Jeff Cantrill
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1466962 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-04 08:53 UTC by Xia Zhao
Modified:	2023-09-14 04:00 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	Cause: The build file and install scripts were out Consequence: Fix: Result:
Clone Of:
Environment:
Last Closed:	2017-07-11 05:11:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
es_log (19.05 KB, text/plain) 2017-07-04 08:53 UTC, Xia Zhao	no flags	Details
inventory file for logging deployment (701 bytes, text/plain) 2017-07-04 08:55 UTC, Xia Zhao	no flags	Details
output of the sgadmin commands (11.58 KB, text/plain) 2017-07-04 09:12 UTC, Xia Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:1646	0	normal	SHIPPED_LIVE	OpenShift Container Platform 3.5 and 3.4 images update	2017-07-11 13:45:07 UTC

Description Xia Zhao 2017-07-04 08:53:04 UTC

Created attachment 1294117 [details]
es_log

Description of problem:
Deploy logging 3.5.0 with the work around of bug #1466626 in https://github.com/openshift/openshift-ansible/pull/4657/files, es is encountering error:
# oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-6hm8z       0/1       CrashLoopBackOff   4          10m
logging-es-yn8sonty-1-285sb   1/1       Running            0          10m
logging-fluentd-l2gn8         1/1       Running            0          10m
logging-fluentd-l67v0         1/1       Running            0          10m
logging-kibana-1-jk01x        2/2       Running            0          10m

# oc logs logging-es-yn8sonty-1-285sb
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms4096m -Xmx4096m'
Checking if Elasticsearch is ready on https://localhost:9200 ........Will connect to localhost:9300 ... done
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. This is not an error, will keep on trying ...
   * Try running sgadmin.sh with -icl and -nhnv (If thats works you need to check your clustername as well as hostnames in your SSL certificates)
   * If this is not working, try running sgadmin.sh with --diagnose and see diagnose trace log file)
Cannot retrieve cluster state due to: class_cast_exception: com.floragunn.searchguard.user.User cannot be cast to com.floragunn.searchguard.user.User. This is not an error, will keep on trying ...

Remote into es pod, the following commands didn't really help:
./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh -icl
./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh -nhnv 
./usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh --diagnose

Version-Release number of selected component (if applicable):
$stage_registry/openshift3/ose-logging-elasticsearch   v3.5                a7989e457354        4 days ago          399.6 MB
$stage_registry/openshift3/logging-elasticsearch       v3.5                a7989e457354        4 days ago          399.6 MB

ansible version: openshift-ansible-playbooks-3.5.91-1.git.0.28b3ddb.el7.noarch

# openshift version
openshift v3.5.5.31
kubernetes v1.5.2+43a9be4
etcd 3.1.0


How reproducible:
Always

Steps to Reproduce:
1.work around bug #1466626 on ansible control node: https://github.com/openshift/openshift-ansible/pull/4657/files
2.Deploy logging 3.5.0    
3.Check es pod's log

Actual results:
es not started up

Expected results:
es should start up

Additional info:
full es log attached

Comment 1 Xia Zhao 2017-07-04 08:55:53 UTC

Created attachment 1294119 [details]
inventory file for logging deployment

Comment 2 Xia Zhao 2017-07-04 09:12:31 UTC

Created attachment 1294124 [details]
output of the sgadmin commands

Comment 3 Xia Zhao 2017-07-04 10:10:32 UTC

logging tests on 3.5 are all blocked since es can't working fine.

Comment 4 Jan Wozniak 2017-07-04 14:42:50 UTC

it is possible to be the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1466962

Comment 5 Xia Zhao 2017-07-05 03:19:57 UTC

issue did not repro with  the current latest released image in public registry ,this is a regression. 

Images tested with:
${public_registry}/openshift3/logging-kibana                                  v3.5                58c8d604b327        2 weeks ago         342.6 MB
${public_registry}/openshift3/logging-fluentd                                 v3.5                7ade1647aad2        2 weeks ago         232.8 MB
${public_registry}/openshift3/logging-elasticsearch                           v3.5                3e25b6e17191        2 weeks ago         399.5 MB
${public_registry}/openshift3/logging-curator                                 v3.5                5222e4da1183        2 weeks ago         211.3 MB
${public_registry}/openshift3/logging-auth-proxy                              v3.5                2c7989093587        2 weeks ago         215.3 MB

--es can start up successfully with the above released images.

Comment 6 Jeff Cantrill 2017-07-05 11:51:09 UTC

*** Bug 1466962 has been marked as a duplicate of this bug. ***

Comment 8 ihorvath 2017-07-05 13:26:05 UTC

The current images available at registry.ops... are newer than the ones mentioned by Xia Zhao. Should we try to revert back to those?

Comment 9 Brenton Leanhardt 2017-07-05 21:06:50 UTC

Talking with Jeff on irc I don't think this bug is available for testing in OCP.  It seems like the affected images will need to be rebuilt.

Comment 10 Praveen Varma 2017-07-06 05:34:17 UTC

@Jeff &/ @Brenton - Can you share an update on this BZ as the Errata is waiting on this BZ. The original date of the Errata release was 29th June which was pushed to 5th July and looks like the date is postponed.

Can you please take a look and let us know when this BZ would be ready for the QA again?

Thanks,
Praveen
Escalation manager

Comment 13 Xia Zhao 2017-07-07 03:09:52 UTC

Verified with the latest images on brew registry, es can start up successfully and logging system worked fine. 

Images tested with:
${brew_registry}/openshift3/logging-curator         v3.5                b672db1aa426        3 hours ago         211.3 MB
${brew_registry}/openshift3/logging-elasticsearch   v3.5                fbd7deba485e        9 hours ago         398.6 MB
${brew_registry}/openshift3/logging-kibana          v3.5                277c4a616a5a        7 days ago          342.6 MB
${brew_registry}/openshift3/logging-fluentd         v3.5                c09565262cad        7 days ago          232.8 MB
${brew_registry}/openshift3/logging-auth-proxy      v3.5                d79212db0381        7 days ago          215.3 MB

# openshift version
openshift v3.5.5.31
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Comment 14 errata-xmlrpc 2017-07-11 05:11:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1646

Comment 15 jooho lee 2017-07-21 19:45:20 UTC

Hi, 

I have exactly same issue. 

This is the image and ocp that I used:
registry.access.redhat.com/openshift3/logging-elasticsearch       v3.5                7724dd80f73d        13 days ago         398.6 MB
registry.access.redhat.com/openshift3/logging-kibana              v3.5                277c4a616a5a        3 weeks ago         342.6 MB
registry.access.redhat.com/openshift3/logging-fluentd             v3.5                c09565262cad        3 weeks ago         232.8 MB
registry.access.redhat.com/openshift3/logging-curator             v3.5                0aa259fbc36e        3 weeks ago         211.3 MB
registry.access.redhat.com/openshift3/logging-auth-proxy          v3.5                d79212db0381        3 weeks ago         215.3 MB


oc v3.5.5.26
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

The image id of elasticsearch "7724dd80f73d" and curator "0aa259fbc36e" is different from your image. Those images contains the fix version?

Should I upgrade ocp to fix it?

Thanks,
Jooho Lee.

Comment 16 Jeff Cantrill 2017-07-23 18:05:29 UTC

Joohoo,

The advisory which contains various 3.5 fixes [1] says it was targeted for release 2017-Jul-17, but I am unable to comment further if it has been pushed out.  Among the items to be resolved:

* bug 1420217. Update ES plugin that squashes stack on start
* bug 1457642. Fix SG timeout
* use SG index defined in ES config

[1] https://errata.devel.redhat.com/advisory/28089

Comment 17 jooho lee 2017-07-24 12:56:38 UTC

Thanks Jeff,

Does "xia zhao" have any comment on this?

Comment 18 Red Hat Bugzilla 2023-09-14 04:00:34 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.