Description of problem: ES DCs are rolled out repeatedly as many times as ES instances are in the cluster. So for clusters having 9 ES instances, each ES deployment will be rolled out 9 times, which is increasing the upgrade time hugely. Below logs confirm what is reported here in a 3-node ES cluster. It is actually rolled out 3 times (each DC). Amount of times the "oc rollout latest" line is called within the playbook (taken from a playboo output in verbose mode): ~~~ $ grep "restart_es_node.yml:26" full-log-ansible.log__20200616191914 2020-06-16 11:38:48,996 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 12:40:00,117 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 13:33:04,978 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 14:11:25,803 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 14:42:49,578 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 15:14:12,514 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 15:46:55,380 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 16:14:16,378 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 2020-06-16 16:42:38,227 p=22003 u=root | task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26 ~~~ Detailed by DC, each DC is called 3 times: ~~~ $ grep '"oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest"' full-log-ansible.log__20200616191914 2020-06-16 11:38:49,699 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-fbg4igk6", "-n", "openshift-logging"], "delta": "0:00:00.235072", "end": "2020-06-16 11:38:49.675674", "rc": 0, "start": "2020-06-16 11:38:49.440602", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out"]} 2020-06-16 12:40:00,745 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-qyqv3tao", "-n", "openshift-logging"], "delta": "0:00:00.194361", "end": "2020-06-16 12:40:00.726272", "rc": 0, "start": "2020-06-16 12:40:00.531911", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out"]} 2020-06-16 13:33:05,632 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-slamchi3", "-n", "openshift-logging"], "delta": "0:00:00.209351", "end": "2020-06-16 13:33:05.612076", "rc": 0, "start": "2020-06-16 13:33:05.402725", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out"]} 2020-06-16 14:11:26,472 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-fbg4igk6", "-n", "openshift-logging"], "delta": "0:00:00.208160", "end": "2020-06-16 14:11:26.448430", "rc": 0, "start": "2020-06-16 14:11:26.240270", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out"]} 2020-06-16 14:42:50,273 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-qyqv3tao", "-n", "openshift-logging"], "delta": "0:00:00.239782", "end": "2020-06-16 14:42:50.251568", "rc": 0, "start": "2020-06-16 14:42:50.011786", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out"]} 2020-06-16 15:14:13,146 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-slamchi3", "-n", "openshift-logging"], "delta": "0:00:00.202501", "end": "2020-06-16 15:14:13.126969", "rc": 0, "start": "2020-06-16 15:14:12.924468", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out"]} 2020-06-16 15:46:56,057 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-fbg4igk6", "-n", "openshift-logging"], "delta": "0:00:00.225226", "end": "2020-06-16 15:46:56.035525", "rc": 0, "start": "2020-06-16 15:46:55.810299", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out"]} 2020-06-16 16:14:17,027 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-qyqv3tao", "-n", "openshift-logging"], "delta": "0:00:00.188399", "end": "2020-06-16 16:14:17.006682", "rc": 0, "start": "2020-06-16 16:14:16.818283", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out"]} 2020-06-16 16:42:38,887 p=22003 u=root | changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-slamchi3", "-n", "openshift-logging"], "delta": "0:00:00.218636", "end": "2020-06-16 16:42:38.867869", "rc": 0, "start": "2020-06-16 16:42:38.649233", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out"]} ~~~ Version-Release number of selected component (if applicable): openshift-ansible-3.11.219-1.git.0.8845382.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy a HA ES cluster 2. Upgrade using openshift-ansible-3.11.219-1 3. Actual results: ES DCs are rolled out repeatedly the same amount of times as ES nodes are in the cluster. Expected results: General Roll outs should happen in a fixed manner, not increasing with the amount of ES instances. Additional info:
Moving to upcomingsprint as unlikely to resolve before EOS
Hi @Nicolas, Can you provide the means in which the customer is running the playbook? It looks like handler is being triggered each time with each node (as the customer states). Repeating handlers is a known Ansible issue[1], however looking in the code we already have taken the approach which should prevent it from running each time [2]. However, if the variable "logging_elasticsearch_rollout_override" is being set to "false" as an "-e" option then it will always override the value set by the handler. [1] https://github.com/ansible/ansible/issues/49371 [2] https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/handlers/main.yml#L11-L13
Hi Eric, Not sure but I think that's unlikely, from the ansible logs: $ grep logging_elasticsearch_rollout_override full-log-ansible.log__20200616191914 2020-06-16 17:14:00,947 p=22003 u=root | ok: [node3-mst-01] => {"ansible_facts": {"logging_elasticsearch_rollout_override": true}, "changed": false} 2020-06-16 17:14:01,530 p=22003 u=root | ok: [node3-mst-01] => {"ansible_facts": {"logging_elasticsearch_rollout_override": true}, "changed": false} 2020-06-16 17:14:02,156 p=22003 u=root | ok: [node3-mst-01] => {"ansible_facts": {"logging_elasticsearch_rollout_override": true}, "changed": false} Anyway, I will confirm how the caller command looks like.
Moving to UpcomingSprint as awaiting feedback
@Eric, here you have the playbook call: $ ansible-playbook -i -u linux -b /usr/share/ansible/XXXXXXX/ose/ --private-key /opt/XXXXX/XXXXXX/ssh-keys/XXXX.pem" /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml
@Nicolas, Can you please provide the full ansible logs and the inventory file?
@Eric, uploading files.
Put UpcomingSprint label, as ongoing investigation to likely to close till EOS.
The fix is not in openshift-ansible-3.11.248-1.git.0.fd212c7.el7
Verified on openshift-ansible-3.11.256
Verified on openshift-ansible-3.11.252 too
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.272 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3245