Bug 1544243
Summary: | Elasticsearch fails to scale up during installation when multiple replicas specified | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Andrew Block <andrew.block> |
Component: | Logging | Assignee: | ewolinet |
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.0 | CC: | andrew.block, aos-bugs, dmoessne, mmckinst, per.carlson, qitang, rmeggins, stwalter, tlarsson, wsun |
Target Milestone: | --- | ||
Target Release: | 3.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: When creating an ES cluster of size 3+ the node quorum and recovery settings prevent oc getthe first ES node from ever reaching a ready and green state in time during a fresh install.
Consequence: The playbook times out waiting for the first ES node to be ready.
Fix: When we create new ES nodes, we do not wait for them to be healthy since the recovery settings and quorum would have changed and will need all nodes to be running at the same time.
Result: We no longer see the playbook time out when creating large clusters of ES nodes.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-05 09:38:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1540099, 1581058 | ||
Bug Blocks: |
Description
Andrew Block
2018-02-11 16:02:34 UTC
Andy, is this during a fresh installation or an upgrade? (In reply to ewolinet from comment #1) > Andy, > is this during a fresh installation or an upgrade? It occurs on both install and upgrades This should resolve the fresh install issue: https://github.com/openshift/openshift-ansible/pull/7097 When you say that the upgrade fails, is it that the playbook fails ultimately, or you see "FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (# retries left)." shows up in the logs a lot? Also to clarify, when you say upgrade you do mean there is an existing deployment of logging and it is being upgraded? (not just that OCP is being upgraded and a fresh installation of logging is being installed). I've been using the below command to deploy the ES pods after using Andy's workaround: for x in $(oc get dc -l component=es -o=custom-columns=NAME:.metadata.name --no-headers); do oc rollout latest $x; done; Same issue with openshift3/ose-ansible/images/v3.7? Pass with openshift-ansible:v3.7.36. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |