Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1544243 - Elasticsearch fails to scale up during installation when multiple replicas specified [NEEDINFO]
Elasticsearch fails to scale up during installation when multiple replicas sp...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.7.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.7.z
Assigned To: ewolinet
Anping Li
:
Depends On: 1540099 1581058
Blocks:
  Show dependency treegraph
 
Reported: 2018-02-11 11:02 EST by Andrew Block
Modified: 2018-05-22 01:47 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When creating an ES cluster of size 3+ the node quorum and recovery settings prevent oc getthe first ES node from ever reaching a ready and green state in time during a fresh install. Consequence: The playbook times out waiting for the first ES node to be ready. Fix: When we create new ES nodes, we do not wait for them to be healthy since the recovery settings and quorum would have changed and will need all nodes to be running at the same time. Result: We no longer see the playbook time out when creating large clusters of ES nodes.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-05 05:38:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ewolinet: needinfo? (andrew.block)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3365421 None None None 2018-02-27 17:23 EST
Red Hat Product Errata RHBA-2018:0636 None None None 2018-04-05 05:39 EDT

  None (edit)
Description Andrew Block 2018-02-11 11:02:34 EST
Description of problem:

Unable to fully execute Aggregated Logging playbook when specifying multiple replicas of Elasticsearch.

Fails to rollout Elasticsearch replicas

FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (60 retries left).
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (59 retries left).
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (58 retries left).
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (57 retries left).

Issue can be overcome by specifying the following inventory variable

logging_elasticsearch_rollout_override=true

Once playbook completes, each Elasticsearch DeploymentConfig can be rolled out

Version-Release number of selected component (if applicable):

3.7.23

How reproducible:

Always

Steps to Reproduce:
1. Specify multiple replicas of Elasticsearch in inventory

openshift_logging_es_number_of_replicas

2. Execute Aggregated Logging playbook

ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

Actual results:

Playbook fails as Elasticsearch never becomes ready


Expected results:

Aggregated logging playbook completes successfully

Additional info:
Comment 1 ewolinet 2018-02-12 09:39:44 EST
Andy,
is this during a fresh installation or an upgrade?
Comment 2 Andrew Block 2018-02-12 10:58:45 EST
(In reply to ewolinet from comment #1)
> Andy,
> is this during a fresh installation or an upgrade?

It occurs on both install and upgrades
Comment 3 ewolinet 2018-02-12 11:53:57 EST
This should resolve the fresh install issue: https://github.com/openshift/openshift-ansible/pull/7097

When you say that the upgrade fails, is it that the playbook fails ultimately, or you see "FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (# retries left)." shows up in the logs a lot?

Also to clarify, when you say upgrade you do mean there is an existing deployment of logging and it is being upgraded? (not just that OCP is being upgraded and a fresh installation of logging is being installed).
Comment 4 Mark McKinstry 2018-02-12 18:24:09 EST
I've been using the below command to deploy the ES pods after using Andy's workaround:

for x in $(oc get dc -l component=es -o=custom-columns=NAME:.metadata.name --no-headers); do oc rollout latest $x; done;
Comment 7 Anping Li 2018-02-28 21:56:11 EST
Same issue with openshift3/ose-ansible/images/v3.7?
Comment 10 Anping Li 2018-03-06 01:45:16 EST
Pass with openshift-ansible:v3.7.36.
Comment 15 errata-xmlrpc 2018-04-05 05:38:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636

Note You need to log in before you can comment on or make changes to this bug.