Bug 1667801

Summary: .kibana.* index.number_of_replicas=0 when openshift_logging_es_number_of_replicas=1
Product: OpenShift Container Platform Reporter: Takayoshi Kimura <tkimura>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: akiyoshi.yonekura, anli, aos-bugs, fcarrus, jortizpa, rmeggins, vhernand
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-26 09:07:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takayoshi Kimura 2019-01-21 06:41:34 UTC
Description of problem:

The .kibana.* index.number_of_replicas is configured with 0 when openshift_logging_es_number_of_replicas=1.

If kibana index primary shard node is down, users cannot access to Kibana, so it's single point of failure.

> health status index                                                             uuid                   pri rep docs.count docs.deleted store.size pri.store.size
> green  open   .searchguard                                                      2C39VDiuQ7O0bNX2r03n6w   1   1          5            0     66.3kb         33.1kb
> green  open   project.glusterfs.06001576-1a05-11e9-87cc-fa163eddb797.2019.01.21 KWjd7Yu1QDKJYRi3sM6YyA   1   1       1129            0      1.7mb          929kb
> green  open   .operations.2019.01.21                                            q4QkFlf4QyeRJq2V6F2DYA   1   1      15459            0     32.3mb         16.1mb
> green  open   .kibana                                                           S98uHNKSROetE2bgvY883A   1   1          1            0      6.4kb          3.2kb
> green  open   .kibana.825a9ae8ef7609f7daf6aaf33d8a4245022aeca3                  QZcgJMDCSR-THex1VDfYDw   1   0          5            0     57.6kb         57.6kb


I think the following "number_of_replicas" should be "index.number_of_replicas".

> $ cat /usr/share/elasticsearch/index_templates/common.settings.kibana.template.json 
> {
>   "order": 0,
>   "settings": {
>     "number_of_replicas": 1,
>     "number_of_shards": 1
>   },
>   "template": ".kibana*"
> } 

Version-Release number of selected component (if applicable):

v3.11.59

How reproducible:

Always

Steps to Reproduce:
1. Setup EFK stack with openshift_logging_es_cluster_size=3 and openshift_logging_es_number_of_replicas=1
2.
3.

Actual results:

.kibana.* index has no replica, SPOF

Expected results:

.kibana.* index has 1 replica and can survive 1 ES instance outage

Additional info:

Comment 2 Anping Li 2019-02-12 06:48:11 UTC
@jeff, it seems the .kibana.xxx is not controlled by common.settings.kibana.template.json.  openshift3/ose-logging-elasticsearch5/images/v3.11.82-2

oc exec -c elasticsearch logging-es-data-master-lae7hidt-1-w2jb2 -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_template/common.settings.kibana.template.json?pretty
{
  "common.settings.kibana.template.json" : {
    "order" : 0,
    "template" : ".kibana*",
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "1"
      }
    },
    "mappings" : { },
    "aliases" : { }
  }
}


oc exec -c elasticsearch logging-es-data-master-lae7hidt-1-w2jb2 -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices?v
health status index                                                                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana.d033e22ae348aeb5660fc2140aec35850c4da997                     qxoUZi7xT2KHtUh_lhDmKA   1   0          2            1     51.9kb         51.9kb
green  open   .searchguard                                                         IN3WbvKHS7WsZtqAIkRrTw   1   1          5            0     66.3kb         33.1kb
green  open   project.install-test.0c86dc0c-2e76-11e9-8516-fa163e4fb738.2019.02.12 EDWs_VTKSNuVB7IPoLQqVw   1   1       1342            0      1.8mb        934.3kb
green  open   .kibana                                                              K5xQ0RrRQzunaQAx_7setA   1   1          1            0      6.4kb          3.2kb
green  open   project.xiaocwan-3t.07a7eb7c-2e8e-11e9-8516-fa163e4fb738.2019.02.12  T1SQL4h2Q6GxbfR0tyGhYg   1   1         29            0     74.6kb         37.3kb
green  open   .operations.2019.02.12                                               PKvdG21_SOCQOsfT0ZJPPw   1   1    1754366            0      3.4gb          1.4gb
green  open   project.xiaocwan3-p.e4ca089c-2e8f-11e9-8516-fa163e4fb738.2019.02.12  iRIiI5ZzRtCw8N_QZmNFpA   1   1        658            0        1mb        519.2kb

Comment 3 Jeff Cantrill 2019-02-12 14:04:19 UTC
(In reply to Anping Li from comment #2)
> @jeff, it seems the .kibana.xxx is not controlled by
> common.settings.kibana.template.json. 
> openshift3/ose-logging-elasticsearch5/images/v3.11.82-2

Replica settings only apply to new indices.  It is not possible to change the replica count after the fact.  You would only be able to observe the replica behavior by deleting the index and allowing Kibana or the the multi-tenant plugin to recreate it.  The template would then apply to the new indices.

Was your observation made for a cluster that had pre-existing kibana indices?

Comment 4 Anping Li 2019-02-13 02:38:07 UTC
@jeff, It is a new kibana indices not a pre-existing one.

Comment 6 Jeff Cantrill 2019-04-03 23:00:10 UTC
Discovered Templates dont apply to the workflow here and that our multi-tenant plugin explicitly sets the replica count to zero.  Updating to rely on the env var defined in the image or fall back to zero

Comment 7 Fulvio Carrus 2019-04-17 17:19:29 UTC
Same problem here, on a fresh EFK setup on OCP 3.11 with shards=1 and replicas=2, the .kibana.3698b2c200929fc5f282a2e1763573a1afc0c9bd index has 1 shard and no replicas.
All other indexes have the correct amount of replicas.

----
$ curl -k --cacert ./admin-cert --key ./admin-key --cert ./admin-cert  https://logging-es:9200/_cat/indices/.kibana.*?v
health status index                                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana.3698b2c200929fc5f282a2e1763573a1afc0c9bd 1XY2s7Q6TJeSkfToLQiSxQ   1   2          5            0    220.7kb         73.5kb
----

I've been able to update the number of replicas on that specific with:

----
curl -k --cacert ./admin-cert --key ./admin-key --cert ./admin-cert -XPUT --data '{ "index": {"number_of_replicas":2} }' https://logging-es:9200/.kibana.3698b2c200929fc5f282a2e1763573a1afc0c9bd/_settings
----

After that, the index now has 1 shard, 2 replicas.

----
$ curl -k --cacert ./admin-cert --key ./admin-key --cert ./admin-cert  https://logging-es:9200/_cat/shards/.kibana.*
.kibana.3698b2c200929fc5f282a2e1763573a1afc0c9bd 0 p STARTED 5 73.5kb 10.130.2.143 logging-es-data-master-pr6scwwz
.kibana.3698b2c200929fc5f282a2e1763573a1afc0c9bd 0 r STARTED 5 73.5kb 10.129.0.37  logging-es-data-master-qvreeyzw
.kibana.3698b2c200929fc5f282a2e1763573a1afc0c9bd 0 r STARTED 5 73.5kb 10.128.2.86  logging-es-data-master-onl44h2q
----

Comment 8 Jeff Cantrill 2019-04-30 12:04:09 UTC
*** Bug 1704667 has been marked as a duplicate of this bug. ***

Comment 9 Victor Hernando 2019-05-03 06:41:30 UTC
Hi Jeff,

Do we have any estimation on when this issue will be fixed?

At this point, my customer had set replica by hand on this kibana index to avoid availability issues, but would be great to have any insights on when this should be fixed.

Thanks in advance!

Comment 10 Jeff Cantrill 2019-05-03 14:43:15 UTC
(In reply to Victor Hernando from comment #9)
> Hi Jeff,
> 
> Do we have any estimation on when this issue will be fixed?

This is already fixed and merged awaiting for final QE and release.  I don't control the schedule so I am unable to comment further.

> 
> At this point, my customer had set replica by hand on this kibana index to
> avoid availability issues, but would be great to have any insights on when
> this should be fixed.
> 
> Thanks in advance!

Comment 14 Anping Li 2019-06-14 08:21:04 UTC
All kibana have replica shard
$ oc exec -c elasticsearch logging-es-data-master-amgmz7nf-1-mgr6n -- indices |grep kibana
green  open   .kibana.16b4d433eeef71946e93341822786a196549c2c5                     7i9T7K_7RsOLEGqInC6C8g   1   1          2            0          0              0
green  open   .kibana.33aab3c7f01620cade108f488cfd285c0e62c1ec                     WN4ozB3dSuSuq98KZC4GPg   1   1          2            1          0              0
green  open   .kibana.315f166c5aca63a157f7d41007675cb44a948b33                     4_2fPIYWR668yjZNZ1iNGA   1   1          2            0          0              0
green  open   .kibana.35cc6a0d62fb5a6042d2bb250adfb03ef31a45c8                     M1l03xRiS7e7moyNuGL9NA   1   1          2            1          0              0
green  open   .kibana.9e057c9b43173621e0726980027e35bf7ccca670                     Va75-KdISWehTkgiNxMTEg   1   1          2            0          0              0
green  open   .kibana.074ed13d61da2f51eac807a9996d1ad3cd707ebd                     oY2ovhTyRGSgVG2XbHD-lA   1   1          2            1          0              0
green  open   .kibana.b226eb2bce020f52aebe9299773d5a82c021a2b1                     PmifGGb7RG6U5dMStjkntg   1   1          2            0          0              0
green  open   .kibana                                                              A8vNElNMRTeImcFztwNiFA   1   1          1            0          0              0
green  open   .kibana.d033e22ae348aeb5660fc2140aec35850c4da997                     lvdLfVbHRcyQHG8xb79Lqg   1   1          2            0          0              0
green  open   .kibana.f276b0dbaafbc5a94a81d30c78d314d9c2b7d30b                     lsvG14g5Tdep3PCy287p8A   1   1          2            0          0              0
green  open   .kibana.ea053d11a8aad1ccf8c18f9241baeb9ec47e5d64                     VjfexYCbTOeJI3RtFe3A6w   1   1          2            0          0              0

Comment 16 errata-xmlrpc 2019-06-26 09:07:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605