Bug 1848606 - ES DCs are rolled out unnecessary amount of times.
Summary: ES DCs are rolled out unnecessary amount of times.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.11.z
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-18 15:25 UTC by Nicolas Nosenzo
Modified: 2023-12-15 18:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-26 22:44:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12204 0 None closed Bug 1848606: Fix repeating logging restarts 2020-09-30 01:05:46 UTC
Red Hat Product Errata RHBA-2020:3245 0 None None None 2020-08-26 22:44:51 UTC

Description Nicolas Nosenzo 2020-06-18 15:25:13 UTC
Description of problem:

ES DCs are rolled out repeatedly as many times as ES instances are in the cluster. So for clusters having 9 ES instances, each ES deployment will be rolled out 9 times, which is increasing the upgrade time hugely. 

Below logs confirm what is reported here in a 3-node ES cluster. It is actually rolled out 3 times (each DC). Amount of times the "oc rollout latest" line is called within the playbook (taken from a playboo output in verbose mode):
~~~
$ grep "restart_es_node.yml:26" full-log-ansible.log__20200616191914
2020-06-16 11:38:48,996 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 12:40:00,117 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 13:33:04,978 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 14:11:25,803 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 14:42:49,578 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 15:14:12,514 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 15:46:55,380 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 16:14:16,378 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
2020-06-16 16:42:38,227 p=22003 u=root |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/restart_es_node.yml:26
~~~

Detailed by DC, each DC is called 3 times:
~~~
$ grep '"oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest"' full-log-ansible.log__20200616191914
2020-06-16 11:38:49,699 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-fbg4igk6", "-n", "openshift-logging"], "delta": "0:00:00.235072", "end": "2020-06-16 11:38:49.675674", "rc": 0, "start": "2020-06-16 11:38:49.440602", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out"]}
2020-06-16 12:40:00,745 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-qyqv3tao", "-n", "openshift-logging"], "delta": "0:00:00.194361", "end": "2020-06-16 12:40:00.726272", "rc": 0, "start": "2020-06-16 12:40:00.531911", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out"]}
2020-06-16 13:33:05,632 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-slamchi3", "-n", "openshift-logging"], "delta": "0:00:00.209351", "end": "2020-06-16 13:33:05.612076", "rc": 0, "start": "2020-06-16 13:33:05.402725", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out"]}
2020-06-16 14:11:26,472 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-fbg4igk6", "-n", "openshift-logging"], "delta": "0:00:00.208160", "end": "2020-06-16 14:11:26.448430", "rc": 0, "start": "2020-06-16 14:11:26.240270", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out"]}
2020-06-16 14:42:50,273 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-qyqv3tao", "-n", "openshift-logging"], "delta": "0:00:00.239782", "end": "2020-06-16 14:42:50.251568", "rc": 0, "start": "2020-06-16 14:42:50.011786", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out"]}
2020-06-16 15:14:13,146 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-slamchi3", "-n", "openshift-logging"], "delta": "0:00:00.202501", "end": "2020-06-16 15:14:13.126969", "rc": 0, "start": "2020-06-16 15:14:12.924468", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out"]}
2020-06-16 15:46:56,057 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-fbg4igk6", "-n", "openshift-logging"], "delta": "0:00:00.225226", "end": "2020-06-16 15:46:56.035525", "rc": 0, "start": "2020-06-16 15:46:55.810299", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-fbg4igk6 rolled out"]}
2020-06-16 16:14:17,027 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-qyqv3tao", "-n", "openshift-logging"], "delta": "0:00:00.188399", "end": "2020-06-16 16:14:17.006682", "rc": 0, "start": "2020-06-16 16:14:16.818283", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-qyqv3tao rolled out"]}
2020-06-16 16:42:38,887 p=22003 u=root |  changed: [appaotel3-mst-01] => {"changed": true, "cmd": ["oc", "--config=/etc/origin/master/admin.kubeconfig", "rollout", "latest", "logging-es-data-master-slamchi3", "-n", "openshift-logging"], "delta": "0:00:00.218636", "end": "2020-06-16 16:42:38.867869", "rc": 0, "start": "2020-06-16 16:42:38.649233", "stderr": "", "stderr_lines": [], "stdout": "deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out", "stdout_lines": ["deploymentconfig.apps.openshift.io/logging-es-data-master-slamchi3 rolled out"]}
~~~

Version-Release number of selected component (if applicable):
openshift-ansible-3.11.219-1.git.0.8845382.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy a HA ES cluster
2. Upgrade using openshift-ansible-3.11.219-1
3.

Actual results:
ES DCs are rolled out repeatedly the same amount of times as ES nodes are in the cluster.

Expected results:
General Roll outs should happen in a fixed manner, not increasing with the amount of ES instances.

Additional info:

Comment 1 Jeff Cantrill 2020-06-18 19:29:35 UTC
Moving to upcomingsprint as unlikely to resolve before EOS

Comment 2 ewolinet 2020-06-18 19:45:39 UTC
Hi @Nicolas, 

Can you provide the means in which the customer is running the playbook? It looks like handler is being triggered each time with each node (as the customer states).

Repeating handlers is a known Ansible issue[1], however looking in the code we already have taken the approach which should prevent it from running each time [2].

However, if the variable "logging_elasticsearch_rollout_override" is being set to "false" as an "-e" option then it will always override the value set by the handler.


[1] https://github.com/ansible/ansible/issues/49371
[2] https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_logging_elasticsearch/handlers/main.yml#L11-L13

Comment 3 Nicolas Nosenzo 2020-06-19 13:03:49 UTC
Hi Eric, 

Not sure but I think that's unlikely, from the ansible logs:

$ grep logging_elasticsearch_rollout_override full-log-ansible.log__20200616191914
2020-06-16 17:14:00,947 p=22003 u=root |  ok: [node3-mst-01] => {"ansible_facts": {"logging_elasticsearch_rollout_override": true}, "changed": false}
2020-06-16 17:14:01,530 p=22003 u=root |  ok: [node3-mst-01] => {"ansible_facts": {"logging_elasticsearch_rollout_override": true}, "changed": false}
2020-06-16 17:14:02,156 p=22003 u=root |  ok: [node3-mst-01] => {"ansible_facts": {"logging_elasticsearch_rollout_override": true}, "changed": false}

Anyway, I will confirm how the caller command looks like.

Comment 4 Jeff Cantrill 2020-06-19 14:29:35 UTC
Moving to UpcomingSprint as awaiting feedback

Comment 5 Nicolas Nosenzo 2020-06-25 09:10:44 UTC
@Eric, here you have the playbook call:

$ ansible-playbook -i -u linux -b /usr/share/ansible/XXXXXXX/ose/ --private-key /opt/XXXXX/XXXXXX/ssh-keys/XXXX.pem" /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

Comment 6 ewolinet 2020-06-29 19:21:10 UTC
@Nicolas,

Can you please provide the full ansible logs and the inventory file?

Comment 7 Nicolas Nosenzo 2020-06-30 11:47:54 UTC
@Eric, uploading files.

Comment 11 Periklis Tsirakidis 2020-07-06 08:10:41 UTC
Put UpcomingSprint label, as ongoing investigation to likely to close till EOS.

Comment 17 Anping Li 2020-07-22 07:38:55 UTC
The fix is not in openshift-ansible-3.11.248-1.git.0.fd212c7.el7

Comment 22 Anping Li 2020-08-04 11:49:09 UTC
Verified on openshift-ansible-3.11.256

Comment 23 Anping Li 2020-08-04 12:21:33 UTC
Verified on openshift-ansible-3.11.252 too

Comment 25 errata-xmlrpc 2020-08-26 22:44:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.272 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3245


Note You need to log in before you can comment on or make changes to this bug.