Cause: When creating an ES cluster of size 3+ the node quorum and recovery settings prevent oc getthe first ES node from ever reaching a ready and green state in time during a fresh install.
Consequence: The playbook times out waiting for the first ES node to be ready.
Fix: When we create new ES nodes, we do not wait for them to be healthy since the recovery settings and quorum would have changed and will need all nodes to be running at the same time.
Result: We no longer see the playbook time out when creating large clusters of ES nodes.
Description of problem:
Unable to fully execute Aggregated Logging playbook when specifying multiple replicas of Elasticsearch.
Fails to rollout Elasticsearch replicas
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (60 retries left).
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (59 retries left).
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (58 retries left).
FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (57 retries left).
Issue can be overcome by specifying the following inventory variable
logging_elasticsearch_rollout_override=true
Once playbook completes, each Elasticsearch DeploymentConfig can be rolled out
Version-Release number of selected component (if applicable):
3.7.23
How reproducible:
Always
Steps to Reproduce:
1. Specify multiple replicas of Elasticsearch in inventory
openshift_logging_es_number_of_replicas
2. Execute Aggregated Logging playbook
ansible-playbook [-i </path/to/inventory>] \
/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml
Actual results:
Playbook fails as Elasticsearch never becomes ready
Expected results:
Aggregated logging playbook completes successfully
Additional info:
This should resolve the fresh install issue: https://github.com/openshift/openshift-ansible/pull/7097
When you say that the upgrade fails, is it that the playbook fails ultimately, or you see "FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (# retries left)." shows up in the logs a lot?
Also to clarify, when you say upgrade you do mean there is an existing deployment of logging and it is being upgraded? (not just that OCP is being upgraded and a fresh installation of logging is being installed).
I've been using the below command to deploy the ES pods after using Andy's workaround:
for x in $(oc get dc -l component=es -o=custom-columns=NAME:.metadata.name --no-headers); do oc rollout latest $x; done;
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2018:0636
Comment 17Red Hat Bugzilla
2023-09-15 00:06:27 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days
Description of problem: Unable to fully execute Aggregated Logging playbook when specifying multiple replicas of Elasticsearch. Fails to rollout Elasticsearch replicas FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (60 retries left). FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (59 retries left). FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (58 retries left). FAILED - RETRYING: Waiting for logging-es-data-master-hsjwgec4 to finish scaling up (57 retries left). Issue can be overcome by specifying the following inventory variable logging_elasticsearch_rollout_override=true Once playbook completes, each Elasticsearch DeploymentConfig can be rolled out Version-Release number of selected component (if applicable): 3.7.23 How reproducible: Always Steps to Reproduce: 1. Specify multiple replicas of Elasticsearch in inventory openshift_logging_es_number_of_replicas 2. Execute Aggregated Logging playbook ansible-playbook [-i </path/to/inventory>] \ /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml Actual results: Playbook fails as Elasticsearch never becomes ready Expected results: Aggregated logging playbook completes successfully Additional info: