Description of problem: ES cluster is in a green state: 1544107971 14:52:51 logging-es green 3 3 2667 1025 0 0 0 0 - 100.0% But ES logs indicate Searchguard is not initialized: [2018-12-06T14:52:07,836][ERROR][i.f.e.p.a.ACLDocumentManager] Exception while syncing ACL to Elasticsearch java.util.concurrent.TimeoutException: Timeout after 30SECONDS while retrieving configuration for [roles, rolesmapping](index=.searchguard) [2018-12-06T00:08:34,645][ERROR][c.f.s.a.BackendRegistry ] Not yet initialized (you may need to run sgadmin) But running sgadmin script fails. See Actual Results section. Version-Release number of selected component (if applicable): ES_VER=5.6.12 OCP 3.10 Actual results: sh-4.2$ /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh \ > -cd ${HOME}/sgconfig \ > -i .searchguard.${HOSTNAME} \ > -ks /etc/elasticsearch/secret/searchguard.key \ > -kst JKS \ > -kspass kspass \ > -ts /etc/elasticsearch/secret/searchguard.truststore \ > -tst JKS \ > -tspass tspass \ > -nhnv \ > -icl /etc/elasticsearch ~ Search Guard Admin v5 Will connect to localhost:9300 ... done ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2 Elasticsearch Version: 5.6.12 Search Guard Version: <unknown> Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ... Clustername: logging-es Clusterstate: GREEN Number of nodes: 3 Number of data nodes: 3 ERR: An unexpected ElasticsearchSecurityException occured: Search Guard not initialized (SG11) for indices:admin/exists. See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md Trace: ElasticsearchSecurityException[Search Guard not initialized (SG11) for indices:admin/exists. See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md] at com.floragunn.searchguard.filter.SearchGuardFilter.apply(SearchGuardFilter.java:122) at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:168) at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142) at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64) at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54) at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceivedDecorate(SearchGuardSSLRequestHandler.java:178) at com.floragunn.searchguard.transport.SearchGuardRequestHandler.messageReceivedDecorate(SearchGuardRequestHandler.java:198) at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceived(SearchGuardSSLRequestHandler.java:140) at com.floragunn.searchguard.SearchGuardPlugin$5$1.messageReceived(SearchGuardPlugin.java:493) at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110) at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1513) at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1396) at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:75) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at -rwxrwxrwx. 1 root root 956 Nov 11 17:54 /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1273) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at java.lang.Thread.run(Thread.java:748) Additional info: Looking around online that error seems to be caused by a mismatch of certificates or something. Is our setup no longer valid to use that command when in ES 5?
> Version-Release number of selected component (if applicable): > ES_VER=5.6.12 > OCP 3.10 do you mean OCP 3.11? ES 5 is not supported on OCP 3.10 and earlier
(In reply to Rich Megginson from comment #2) > > Version-Release number of selected component (if applicable): > > ES_VER=5.6.12 > > OCP 3.10 > > do you mean OCP 3.11? ES 5 is not supported on OCP 3.10 and earlier The issue is coming up during an upgrade from 3.10 -> 3.11. Steven should be able to provide some more detail about the state of the cluster.
There is something not right here. The move to support ES5 included a change such that there was a single searchguard index which the following snippet displays its trying to create one unique to the host. Additionally, this is not even current as we moved away from the hostname to DC a while ago (ref: [1]). There has to be significantly old images as even 3.9 uses the DC name. Is the attached snapshot associated with the cluster in question? I was unable to find any reference to this issue. > > > Actual results: > sh-4.2$ /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh \ > > -cd ${HOME}/sgconfig \ > > -i .searchguard.${HOSTNAME} \ > > -ks /etc/elasticsearch/secret/searchguard.key \ > > -kst JKS \ > > -kspass kspass \ > > -ts /etc/elasticsearch/secret/searchguard.truststore \ > > -tst JKS \ > > -tspass tspass \ > > -nhnv \ > > -icl > /etc/elasticsearch ~ Generally, manual reseeding of a cluster can be accomplished with 'oc exec -c elasticsearch $pod -- es_seed_acl' [1] https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_logging_elasticsearch/templates/2.x/elasticsearch.yml.j2#L68
Please provide log output as an attachment instead of inline comments. I don't see anything in the listing that would indicate an attempt to seed the .searchguard index. Seeding occurs at initialization or reseeding by: 'oc exec -c elasticsearch $pod -- es_seed_acl' Success should display something like: 2018-12-12 04:34:39,972][INFO ][container.run ] Forcing the seeding of ACL documents [2018-12-12 04:34:39,978][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds. [2018-12-12 04:34:40,134][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds. /etc/elasticsearch /usr/share/elasticsearch/init Search Guard Admin v5 Will connect to localhost:9300 ... done [2018-12-12T04:34:40,710][INFO ][o.e.c.r.a.AllocationService] [logging-es-data-master-qgpa12yp] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.operations.2018.12.05][0], [.operations.2018.12.11][0], [.searchguard][0]] ...]). [2018-12-12T04:34:40,918][INFO ][c.f.s.c.IndexBaseConfigurationRepository] Node 'logging-es-data-master-qgpa12yp' initialized ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2 Elasticsearch Version: 5.6.13 Search Guard Version: <unknown> Contacting elasticsearch cluster 'elasticsearch' ... Clustername: logging-es Clusterstate: GREEN Number of nodes: 1 Number of data nodes: 1 .searchguard index already exists, so we do not need to create one. Populate config from /opt/app-root/src/sgconfig/ Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml SUCC: Configuration for 'config' created or updated Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml SUCC: Configuration for 'roles' created or updated Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml SUCC: Configuration for 'rolesmapping' created or updated Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml SUCC: Configuration for 'internalusers' created or updated Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml SUCC: Configuration for 'actiongroups' created or updated Done with success /usr/share/elasticsearch/init [2018-12-12 04:34:44,917][INFO ][container.run ] Seeded the searchguard ACL index [2018-12-12 04:34:44,926][INFO ][container.run ] Disabling auto replication There is no need to restart the pods after this step.
Created attachment 1514099 [details] es_seed_acl output
Cu executed es_seed_acl with results in attachment es_seed_acl output. They still have no logs. They had already downgraded to 3.11.16 and that did not help.
I'm not sure where that specific template came from as the one we ship [1] that matches those same indices has a differen't ordering. We take no specific actions to remove existing templates. I'm not certain I understand how this template would cause the described issue @Lukas, Should we be adding logic to remove templates on upgrades? [1] https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L954