Description of problem: We are trying to upgrade an Openshift 3.11 cluster to version 219 (from 154). When upgrading the EFK stack, the playbook fails when trying to restart the logging-es-ops cluster. This is because the openshift_logging_es_ops_cluster_size var is not used when getting the running pods in the cluster. We are running this playbook: /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml With openshift_logging_es_ops_cluster_size set to 3 and openshift_logging_es_cluster_size set to 5 As you can see in the code, the task uses the openshift_logging_es_cluster_size var in the until clause: ~~~ ## get all pods for the cluster - command: > {{ openshift_client_binary }} --config={{ openshift.common.config_base }}/master/admin.kubeconfig get pod -l component={{ _cluster_component }},provider=openshift -n {{ openshift_logging_elasticsearch_namespace }} -o jsonpath={.items[?(@.status.phase==\"Running\")].metadata.name} register: _cluster_pods retries: "{{ __elasticsearch_ready_retries }}" delay: 5 until: - _cluster_pods.stdout is defined - _cluster_pods.stdout == "" or _cluster_pods.stdout.split(' ') | count == openshift_logging_es_cluster_size ~~~ https://github.com/openshift/openshift-ansible/blob/openshift-ansible-3.11.286-1/roles/openshift_logging_elasticsearch/tasks/restart_cluster.yml#L15 As there are 3 logging-es-ops node instead of 5, this check fails. Version-Release number of selected component (if applicable): OCP 3.11 How reproducible: 100% Steps to Reproduce: 1. Customer has had this issue everytime with different clusters 2. 3. Actual results: Upgrade to the logging stack fails Expected results: Ideally the playbook would check whether openshift_logging_es_ops_cluster_size is set for the logging-ops stack and use that variable instead when restarting the ops cluster. Additional info:
Setting UpcomingSprint as unable to resolve before EOD
The fix is not in the package openshift-ansible-3.11.380-1.git.0.983c5d1.el7.noarch
it looks like this fix didn't make it into 3.11.380-1 but should make it into 3.11.381-1 when it is released
Created attachment 1758430 [details] The inventory and playbook logs openshift-ansible-3.11.391-1.git.0.aa2204f.el7.noarch
Verified on openshift-ansible-roles-3.11.394-6.git.0.47ec25d.el7.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 3.11.394 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0637