Bug 1657363 - ES in green cluster state, but searchguard won't initialize
Summary: ES in green cluster state, but searchguard won't initialize
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-07 18:55 UTC by Steven Walter
Modified: 2019-01-22 14:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-22 14:59:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
es_seed_acl output (2.60 KB, text/plain)
2018-12-13 16:05 UTC, Jack Ottofaro
no flags Details

Description Steven Walter 2018-12-07 18:55:55 UTC
Description of problem:
ES cluster is in a green state:
1544107971 14:52:51  logging-es green           3         3   2667 1025    0    0        0             0                  -                100.0%

But ES logs indicate Searchguard is not initialized:
[2018-12-06T14:52:07,836][ERROR][i.f.e.p.a.ACLDocumentManager] Exception while syncing ACL to Elasticsearch
java.util.concurrent.TimeoutException: Timeout after 30SECONDS while retrieving configuration for [roles, rolesmapping](index=.searchguard)

[2018-12-06T00:08:34,645][ERROR][c.f.s.a.BackendRegistry  ] Not yet initialized (you may need to run sgadmin)

But running sgadmin script fails. See Actual Results section.

Version-Release number of selected component (if applicable):
ES_VER=5.6.12
OCP 3.10


Actual results:
sh-4.2$ /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh \
 >         -cd ${HOME}/sgconfig \
>         -i .searchguard.${HOSTNAME} \
>         -ks /etc/elasticsearch/secret/searchguard.key \
 >         -kst JKS \
>         -kspass kspass \
>         -ts /etc/elasticsearch/secret/searchguard.truststore \
>         -tst JKS \
>         -tspass tspass \
>         -nhnv \
>         -icl
/etc/elasticsearch ~
Search Guard Admin v5
Will connect to localhost:9300 ... done
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.12
Search Guard Version: <unknown>
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
Clustername: logging-es
Clusterstate: GREEN
Number of nodes: 3
Number of data nodes: 3
ERR: An unexpected ElasticsearchSecurityException occured: Search Guard not initialized (SG11) for indices:admin/exists. See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md
Trace:
ElasticsearchSecurityException[Search Guard not initialized (SG11) for indices:admin/exists. See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md]
        at com.floragunn.searchguard.filter.SearchGuardFilter.apply(SearchGuardFilter.java:122)
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:168)
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142)
        at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64)
        at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54)
        at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceivedDecorate(SearchGuardSSLRequestHandler.java:178)
        at com.floragunn.searchguard.transport.SearchGuardRequestHandler.messageReceivedDecorate(SearchGuardRequestHandler.java:198)
        at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceived(SearchGuardSSLRequestHandler.java:140)
        at com.floragunn.searchguard.SearchGuardPlugin$5$1.messageReceived(SearchGuardPlugin.java:493)
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
        at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
        at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1513)
        at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1396)
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at -rwxrwxrwx. 1 root root 956 Nov 11 17:54 /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1273)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at java.lang.Thread.run(Thread.java:748)



Additional info:
Looking around online that error seems to be caused by a mismatch of certificates or something. Is our setup no longer valid to use that command when in ES 5?

Comment 2 Rich Megginson 2018-12-07 19:11:47 UTC
> Version-Release number of selected component (if applicable):
> ES_VER=5.6.12
> OCP 3.10

do you mean OCP 3.11?  ES 5 is not supported on OCP 3.10 and earlier

Comment 3 Robert Bost 2018-12-08 00:35:24 UTC
(In reply to Rich Megginson from comment #2)
> > Version-Release number of selected component (if applicable):
> > ES_VER=5.6.12
> > OCP 3.10
> 
> do you mean OCP 3.11?  ES 5 is not supported on OCP 3.10 and earlier

The issue is coming up during an upgrade from 3.10 -> 3.11. Steven should be able to provide some more detail about the state of the cluster.

Comment 4 Jeff Cantrill 2018-12-10 15:46:07 UTC
There is something not right here.  The move to support ES5 included a change such that there was a single searchguard index which the following snippet displays its trying to create one unique to the host.  Additionally, this is not even current as we moved away from the hostname to DC a while ago (ref: [1]). There has to be significantly old images as even 3.9 uses the DC name.  Is the attached snapshot associated with the cluster in question? I was unable to find any reference to this issue.
> 
> 
> Actual results:
> sh-4.2$ /usr/share/elasticsearch/plugins/openshift-elasticsearch/sgadmin.sh \
>  >         -cd ${HOME}/sgconfig \
> >         -i .searchguard.${HOSTNAME} \
> >         -ks /etc/elasticsearch/secret/searchguard.key \
>  >         -kst JKS \
> >         -kspass kspass \
> >         -ts /etc/elasticsearch/secret/searchguard.truststore \
> >         -tst JKS \
> >         -tspass tspass \
> >         -nhnv \
> >         -icl
> /etc/elasticsearch ~


Generally, manual reseeding of a cluster can be accomplished with 'oc exec -c elasticsearch $pod -- es_seed_acl'

[1] https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_logging_elasticsearch/templates/2.x/elasticsearch.yml.j2#L68

Comment 7 Jeff Cantrill 2018-12-12 15:05:03 UTC
Please provide log output as an attachment instead of inline comments.  I don't see anything in the listing that would indicate an attempt to seed the .searchguard index.  Seeding occurs at initialization or reseeding by:

'oc exec -c elasticsearch $pod -- es_seed_acl'

Success should display something like:

2018-12-12 04:34:39,972][INFO ][container.run            ] Forcing the seeding of ACL documents
[2018-12-12 04:34:39,978][INFO ][container.run            ] Seeding the searchguard ACL index.  Will wait up to 604800 seconds.
[2018-12-12 04:34:40,134][INFO ][container.run            ] Seeding the searchguard ACL index.  Will wait up to 604800 seconds.
/etc/elasticsearch /usr/share/elasticsearch/init
Search Guard Admin v5
Will connect to localhost:9300 ... done
[2018-12-12T04:34:40,710][INFO ][o.e.c.r.a.AllocationService] [logging-es-data-master-qgpa12yp] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.operations.2018.12.05][0], [.operations.2018.12.11][0], [.searchguard][0]] ...]).
[2018-12-12T04:34:40,918][INFO ][c.f.s.c.IndexBaseConfigurationRepository] Node 'logging-es-data-master-qgpa12yp' initialized
ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Elasticsearch Version: 5.6.13
Search Guard Version: <unknown>
Contacting elasticsearch cluster 'elasticsearch' ...
Clustername: logging-es
Clusterstate: GREEN
Number of nodes: 1
Number of data nodes: 1
.searchguard index already exists, so we do not need to create one.
Populate config from /opt/app-root/src/sgconfig/
Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml
   SUCC: Configuration for 'config' created or updated
Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml
   SUCC: Configuration for 'roles' created or updated
Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml
   SUCC: Configuration for 'rolesmapping' created or updated
Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml
   SUCC: Configuration for 'internalusers' created or updated
Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml
   SUCC: Configuration for 'actiongroups' created or updated
Done with success
/usr/share/elasticsearch/init
[2018-12-12 04:34:44,917][INFO ][container.run            ] Seeded the searchguard ACL index
[2018-12-12 04:34:44,926][INFO ][container.run            ] Disabling auto replication


There is no need to restart the pods after this step.

Comment 9 Jack Ottofaro 2018-12-13 16:05:50 UTC
Created attachment 1514099 [details]
es_seed_acl output

Comment 10 Jack Ottofaro 2018-12-13 16:09:54 UTC
Cu executed es_seed_acl with results in attachment es_seed_acl output. They still have no logs. They had already downgraded to 3.11.16 and that did not help.

Comment 16 Jeff Cantrill 2018-12-19 14:49:51 UTC
I'm not sure where that specific template came from as the one we ship [1] that matches those same indices has a differen't ordering.  We take no specific actions to remove existing templates.  I'm not certain I understand how this template would cause the described issue

@Lukas, Should we be adding logic to remove templates on upgrades?

[1] https://github.com/openshift/origin-aggregated-logging/blob/release-3.11/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L954


Note You need to log in before you can comment on or make changes to this bug.