Bug 1459054

Summary:

Timeout creating SearchGuard index

Product:

OpenShift Container Platform

Reporter:

Ruben Romero Montes <rromerom>

Component:

Logging

Assignee:

Jeff Cantrill <jcantril>

Status:

CLOSED DUPLICATE

QA Contact:

Xia Zhao <xiazhao>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

3.4.1

CC:

aivaraslaimikis, aos-bugs, erich, jcantril, nnosenzo, pdwyer, pportant, tlarsson

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-06-27 16:34:11 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
docker logs	none
docker inspect	none
all_logging	none
nodes description	none

Description Ruben Romero Montes 2017-06-06 08:02:28 UTC

Created attachment 1285280 [details]
docker logs

Description of problem:
SearchGuard is not able to initialize after a timeout.

[2017-06-05 08:15:36,606][INFO ][cluster.routing.allocation] [Crimson] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[project.aes-mbaas-infra.26cf63cd-2b47-11e7-a35d-0acaab79e3f7.2017.05.22][0], [.searchguard.logging-es-xgrcmvev-3-5nb30][0], [project.nagp-il-core-int-01.7c640146-08a0-11e7-8d5d-0610033e8e3f.2017.05.22][0], [.searchguard.logging-es-sm5vnjla-3-55uhm][0]] ...]).
Clustername: logging-es
Clusterstate: YELLOW
Number of nodes: 3
Number of data nodes: 3
.searchguard.logging-es-dyppkops-2-qapt3 index does not exists, attempt to create it ... [2017-06-05 08:15:46,856][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-06-05 08:15:54,250][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
...
ERR: An unexpected ProcessClusterEventTimeoutException occured: failed to process cluster event (create-index [.searchguard.logging-es-dyppkops-2-qapt3], cause [api]) within 30s
Trace:
ProcessClusterEventTimeoutException[failed to process cluster event (create-index [.searchguard.logging-es-dyppkops-2-qapt3], cause [api]) within 30s]
	at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:349)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

...
[2017-06-05 08:25:06,818][INFO ][cluster.routing.allocation] [Crimson] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.searchguard.logging-es-xgrcmvev-3-5nb30][0], [.searchguard.logging-es-xgrcmvev-3-5nb30][0]] ...]).

Version-Release number of selected component (if applicable):
openshift3-logging-elasticsearch-3.4.1-26

How reproducible:
Only in the reporting environment.

Steps to Reproduce:
1. Scale down to 0 all the 3 deploymentConfigs
2. Scale up to 1 all 3 deploymentconfigs

Actual results:
failed to process cluster event (create-index [.searchguard.logging-es-dyppkops-2-qapt3], cause [api]) within 30s

Expected results:
The SearchGuard index to be initialized

Additional info:
  Volume type is gp2 with 1500 / 3000 iops, with 500GiB of storage
    148G of data free
    /dev/mapper/vg01-data           500G  353G  148G  71% /data
  Deployment: AWS
  Memory: 16GB
  Ec2 instances are m4.xlarge for the masters, and r4.xlarge for the nodes

  Ensured they have auto_expand_replicas: 2 in the configmap.

Comment 1 Ruben Romero Montes 2017-06-06 08:05:31 UTC

Created attachment 1285281 [details]
docker inspect

Comment 2 Ruben Romero Montes 2017-06-06 08:05:52 UTC

Created attachment 1285282 [details]
all_logging

Comment 3 Ruben Romero Montes 2017-06-06 08:06:24 UTC

Created attachment 1285283 [details]
nodes description

Comment 7 Ruben Romero Montes 2017-06-07 14:08:25 UTC

The manual workaround can be to initialize SearchGuard from inside all three pods.

 $ oc rsh <logging-es-pod>
 # /usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh \
        -cd ${HOME}/sgconfig \
        -i .searchguard.${HOSTNAME} \
        -ks /etc/elasticsearch/secret/searchguard.key \
        -kst JKS \
        -kspass kspass \
        -ts /etc/elasticsearch/secret/searchguard.truststore \
        -tst JKS \
        -tspass tspass \
        -nhnv \
        -icl

Or also try to close some old indices manually in order to speed up the initialization.

Comment 8 Jeff Cantrill 2017-06-27 16:34:11 UTC

Closing this as a dup since its all related to the initialization of the SG index for which we have a fix and needs to be ported to 3.4.1.  We will resolve against #1449378.  Ref upstream PR to be backported: https://github.com/openshift/origin-aggregated-logging/pull/469

*** This bug has been marked as a duplicate of bug 1449378 ***