Bug 1459054 - Timeout creating SearchGuard index
Timeout creating SearchGuard index
Status: CLOSED DUPLICATE of bug 1449378
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
Unspecified Unspecified
high Severity urgent
: ---
: ---
Assigned To: Jeff Cantrill
Xia Zhao
Depends On:
  Show dependency treegraph
Reported: 2017-06-06 04:02 EDT by Ruben Romero Montes
Modified: 2017-06-27 12:34 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-06-27 12:34:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
docker logs (10.98 KB, application/x-xz)
2017-06-06 04:02 EDT, Ruben Romero Montes
no flags Details
docker inspect (4.88 KB, application/x-xz)
2017-06-06 04:05 EDT, Ruben Romero Montes
no flags Details
all_logging (264.22 KB, text/plain)
2017-06-06 04:05 EDT, Ruben Romero Montes
no flags Details
nodes description (98.87 KB, text/plain)
2017-06-06 04:06 EDT, Ruben Romero Montes
no flags Details

  None (edit)
Description Ruben Romero Montes 2017-06-06 04:02:28 EDT
Created attachment 1285280 [details]
docker logs

Description of problem:
SearchGuard is not able to initialize after a timeout.

[2017-06-05 08:15:36,606][INFO ][cluster.routing.allocation] [Crimson] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[project.aes-mbaas-infra.26cf63cd-2b47-11e7-a35d-0acaab79e3f7.2017.05.22][0], [.searchguard.logging-es-xgrcmvev-3-5nb30][0], [project.nagp-il-core-int-01.7c640146-08a0-11e7-8d5d-0610033e8e3f.2017.05.22][0], [.searchguard.logging-es-sm5vnjla-3-55uhm][0]] ...]).
Clustername: logging-es
Clusterstate: YELLOW
Number of nodes: 3
Number of data nodes: 3
.searchguard.logging-es-dyppkops-2-qapt3 index does not exists, attempt to create it ... [2017-06-05 08:15:46,856][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
[2017-06-05 08:15:54,250][ERROR][com.floragunn.searchguard.auth.BackendRegistry] Not yet initialized
ERR: An unexpected ProcessClusterEventTimeoutException occured: failed to process cluster event (create-index [.searchguard.logging-es-dyppkops-2-qapt3], cause [api]) within 30s
ProcessClusterEventTimeoutException[failed to process cluster event (create-index [.searchguard.logging-es-dyppkops-2-qapt3], cause [api]) within 30s]
	at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:349)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

[2017-06-05 08:25:06,818][INFO ][cluster.routing.allocation] [Crimson] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.searchguard.logging-es-xgrcmvev-3-5nb30][0], [.searchguard.logging-es-xgrcmvev-3-5nb30][0]] ...]).

Version-Release number of selected component (if applicable):

How reproducible:
Only in the reporting environment.

Steps to Reproduce:
1. Scale down to 0 all the 3 deploymentConfigs
2. Scale up to 1 all 3 deploymentconfigs

Actual results:
failed to process cluster event (create-index [.searchguard.logging-es-dyppkops-2-qapt3], cause [api]) within 30s

Expected results:
The SearchGuard index to be initialized

Additional info:
  Volume type is gp2 with 1500 / 3000 iops, with 500GiB of storage
    148G of data free
    /dev/mapper/vg01-data           500G  353G  148G  71% /data
  Deployment: AWS
  Memory: 16GB
  Ec2 instances are m4.xlarge for the masters, and r4.xlarge for the nodes

  Ensured they have auto_expand_replicas: 2 in the configmap.
Comment 1 Ruben Romero Montes 2017-06-06 04:05 EDT
Created attachment 1285281 [details]
docker inspect
Comment 2 Ruben Romero Montes 2017-06-06 04:05 EDT
Created attachment 1285282 [details]
Comment 3 Ruben Romero Montes 2017-06-06 04:06 EDT
Created attachment 1285283 [details]
nodes description
Comment 7 Ruben Romero Montes 2017-06-07 10:08:25 EDT
The manual workaround can be to initialize SearchGuard from inside all three pods.

 $ oc rsh <logging-es-pod>
 # /usr/share/java/elasticsearch/plugins/search-guard-2/tools/sgadmin.sh \
        -cd ${HOME}/sgconfig \
        -i .searchguard.${HOSTNAME} \
        -ks /etc/elasticsearch/secret/searchguard.key \
        -kst JKS \
        -kspass kspass \
        -ts /etc/elasticsearch/secret/searchguard.truststore \
        -tst JKS \
        -tspass tspass \
        -nhnv \

Or also try to close some old indices manually in order to speed up the initialization.
Comment 8 Jeff Cantrill 2017-06-27 12:34:11 EDT
Closing this as a dup since its all related to the initialization of the SG index for which we have a fix and needs to be ported to 3.4.1.  We will resolve against #1449378.  Ref upstream PR to be backported: https://github.com/openshift/origin-aggregated-logging/pull/469

*** This bug has been marked as a duplicate of bug 1449378 ***

Note You need to log in before you can comment on or make changes to this bug.