Description of problem: Previously it was enough to set openshift_logging_es_cluster_size=3 and openshift_logging_es_ops_cluster_size=3 to make a EFK deployment fault tolerant. However now the configuration of shards and replicas has changed which renders a EFK deployment non-redundant. Version-Release number of selected component (if applicable): I believe this was a change introduced in 3.5 How reproducible: Every time Steps to Reproduce: 1. Set openshift_logging_es_cluster_size=3 2. Set openshift_logging_es_ops_cluster_size=3 if deploying ops separately 3. Actual results: number_of_shards is set to 1 number_of_replicas is set to 0 Expected results: There should be a documented way to make EFK HA and fault tolerant, the only way to do that is by also setting openshift_logging_es_number_of_shards and openshift_logging_es_number_of_replicas, which are not documented.
The fact this was removed happened and 3.5 and we are not respecting the original values as logged here: https://bugzilla.redhat.com/show_bug.cgi?id=1489498. Moving this to documentation as that is what is being stated as whats missing.
I just wanted to add that I'm continuing to see this issue with OCP 3.11. It is not clearly documented, but the solution was to specify those indicated variables, in my case: openshift_logging_es_number_of_shards: 1 openshift_logging_es_number_of_replicas: 2
OCP 3.6-3.10 is no longer on full support [1]. Marking un-triaged bugs CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Version to the appropriate version where reproduced. [1]: https://access.redhat.com/support/policy/updates/openshift