Bug 1663425

Summary:	Red status of ES after applying patches (Search Guard 2 plugin not available)
Product:	OpenShift Container Platform	Reporter:	Radomir Ludva <rludva>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED NOTABUG	QA Contact:	Anping Li <anli>
Severity:	medium	Docs Contact:
Priority:	low
Version:	3.5.0	CC:	aos-bugs, rludva, rmeggins
Target Milestone:	---
Target Release:	3.5.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-05 00:55:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Radomir Ludva 2019-01-04 09:29:15 UTC

Description of problem:
After applying last pathing of the OCP cluster the Elasticsearch goes to read.
The logs are exactly the same as for this BugZilla: 1626957
But here we have version 3.5 with the same behavior.

/usr/share/java/elasticsearch/config
Will connect to localhost:9300 ... done
2019-01-03 09:15:47 INFO  SearchGuardSSLPlugin:84 - Search Guard 2 plugin not available
2019-01-03 09:15:47 INFO  SearchGuardPlugin:58 - Clustername: elasticsearch
2019-01-03 09:15:47 INFO  SearchGuardPlugin:70 - Node [null] is a transportClient: true/tribeNode: false/tribeNodeClient: false
2019-01-03 09:15:47 INFO  plugins:180 - [Sara Grey] modules [], plugins [search-guard-ssl, search-guard2], sites []
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:423 - Open SSL not available (this is not an error, we simply fallback to built-in JDK SSL) because of java.lang.ClassNo
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:173 - Config directory is /usr/share/java/elasticsearch/config/, from there the key- and truststore files are resolved r
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:142 - sslTransportClientProvider:JDK with ciphers [TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_C
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:144 - sslTransportServerProvider:JDK with ciphers [TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_C
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:146 - sslHTTPProvider:null with ciphers []
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:148 - sslTransport protocols [TLSv1.2, TLSv1.1]
2019-01-03 09:15:47 INFO  DefaultSearchGuardKeyStore:149 - sslHTTP protocols [TLSv1.2, TLSv1.1]
2019-01-03 09:15:48 INFO  transport:99 - [Sara Grey] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-s
Contacting elasticsearch cluster 'elasticsearch' and wait for YELLOW clusterstate ...
 

Version-Release number of selected component (if applicable):
ansible-2.4.2.0-2.el7.noarch
atomic-1.22.1-26.gitb507039.el7.x86_64
atomic-openshift-3.5.5.31.80-1.git.0.c4a0780.el7.x86_64
atomic-openshift-clients-3.5.5.31.80-1.git.0.c4a0780.el7.x86_64
atomic-openshift-docker-excluder-3.5.5.31.80-1.git.0.c4a0780.el7.noarch
atomic-openshift-excluder-3.5.5.31.80-1.git.0.c4a0780.el7.noarch
atomic-openshift-master-3.5.5.31.80-1.git.0.c4a0780.el7.x86_64
atomic-openshift-node-3.5.5.31.80-1.git.0.c4a0780.el7.x86_64 
atomic-openshift-sdn-ovs-3.5.5.31.80-1.git.0.c4a0780.el7.x86_64
atomic-registries-1.22.1-26.gitb507039.el7.x86_64
redhat-release-server-7.6-4.el7.x86_64
ES_CLOUD_K8S_VER=2.4.4
ES_VER=2.4.4
OSE_ES_VER=2.4.4.17

How reproducible:
After patching OCP cluster 

Actual results:
ES status: red

Expected results:
ES status: green (without any action after processing the upgrade and applying patches)

Aditional information:
> I suggest customer to delere red indexes of searchguard, but still without answer:
> If you list red indexes of searchguard, do you have any?
> curl --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200/_cat/indices -s | grep red
> You can delete them:
> curl --key /etc/elasticsearch/secret/admin-key --cert /etc/elasticsearch/secret/admin-cert --cacert /etc/elasticsearch/secret/admin-ca https://localhost:9200/.searchguard.logging-es<some_identifier> -X DELETE

Comment 2 Jeff Cantrill 2019-01-04 19:26:32 UTC

I'm not sure why you advised them to remove any indices since that fact they are 'red' is not an indication they are 'bad' or in an error state.  The color of an index is an indication of the state of replication of the shards associated with the indices; that's it.  The first thing I would advise, is to attempt to reseed all the searchguard indicies.  This must be performed for each elasticsearch pod:

oc exec $espod -- es_seed_acl

We then need to figure out why the cluster is in the red state by looking at which indices are red.  This is most easily achievable by rsh'ing into one of the ES pods:

oc rsh $espod

and then:

QUERY=_cat/indices es_util


This may give us a clue from which we can further determine future action.


[1] https://github.com/openshift/origin-aggregated-logging/blob/release-1.5/elasticsearch/utils/es_util


Lowering the priority as this cluster is older then N-2 where N is 3.11.

Comment 3 Radomir Ludva 2019-01-05 00:55:20 UTC

At the time of creating Bugzilla it looks like a bug, but finally, it was a configuration issue with NFS file system which is not intended for storing indexing database for Elasticsearch.