Hide Forgot
Description of problem: Scale up nodes failed at task [Apply ignition manifest] Version-Release number of the following components: openshift-ansible-4.1.0-201904091404 ansible 2.7.9 config file = /home/wmeng/openshift/openshift-ansible/ansible.cfg configured module search path = ['/home/wmeng/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/local/lib/python3.6/site-packages/ansible executable location = /usr/local/bin/ansible python version = 3.6.6 (default, Mar 29 2019, 00:03:27) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/tasks/config.yml#L78 How reproducible: Always Steps to Reproduce: following https://github.com/openshift/openshift-ansible/blob/master/README.md 1. set up OCP4 cluster 2. prepare new_worker vms 3. run scaleup playbook $ansible-playbook -i inventory/hosts playbooks/scaleup.yml Actual results: failed due to host UNREACHABLE <ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o 'IdentityFile="/home/wmeng/shared-secrets/aws/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ec2-user -o ConnectTimeout=30 -o ControlPath=/home/wmeng/.ansible/cp/%h-%r -tt ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-pwwpajzkopvjdnttsmmyispqpjrskwdv; /usr/bin/python /home/ec2-user/.ansible/tmp/ansible-tmp-1554971512.1921854-243809547831033/async_wrapper.py 339131747332 900 /home/ec2-user/.ansible/tmp/ansible-tmp-1554971512.1921854-243809547831033/AnsiballZ_command.py _'"'"'"'"'"'"'"'"' && sleep 0'"'"'' Escalation succeeded <ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com> (255, b'{"started": 1, "_ansible_suppress_tmpdir_delete": true, "finished": 0, "results_file": "/root/.ansible_async/339131747332.4000", "ansible_job_id": "339131747332.4000"}\r\n', b'Shared connection to ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com closed.\r\n') fatal: [ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com closed.", "unreachable": true } PLAY RECAP ******************************************************************************************************************************************************************************************************** ec2-18-182-49-196.ap-northeast-1.compute.amazonaws.com : ok=18 changed=7 unreachable=1 failed=0 ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com : ok=18 changed=7 unreachable=1 failed=0 ec2-54-95-137-65.ap-northeast-1.compute.amazonaws.com : ok=18 changed=7 unreachable=1 failed=0 localhost : ok=0 changed=0 unreachable=0 failed=0 Thursday 11 April 2019 04:39:00 -0400 (0:07:08.795) 0:08:59.703 ******** =============================================================================== openshift_node : Apply ignition manifest ----------------------------------------------------------------------------------------------------------------------------------------------------------------- 428.80s /home/wmeng/openshift/openshift-ansible/roles/openshift_node/tasks/config.yml:78 --------------------------------------------------------------------------------------------------------------------------------- Expected results: ansible playbook complete successfully.
Please attach full ansible -vvv logs. Please ensure openshift-ansible/ansible.cfg is being used during playbook execution. The task which failed could be related to not having ssh retries configured. https://github.com/openshift/openshift-ansible/blob/master/ansible.cfg#L38
Logs show task [Apply ignition manifest] was tried 15 times and failed after 7 minutes.
It appears the node was rebooted, but it doesn't come up after it. Not much Ansible can do about this. Is this reproducible? Is there a console log from the node when task failed?
Created attachment 1555403 [details] ec2-console-log
(In reply to Vadim Rutkovsky from comment #4) > It appears the node was rebooted, but it doesn't come up after it. Not much > Ansible can do about this. > > Is this reproducible? Is there a console log from the node when task failed? It can be reproduced. console log attatched.
Changes in the way reboots are handled during apply for ignition manifest have merged. openshift-ansible-4.1.0-201904161832 Please test again.
Fixed. openshift-ansible-4.1.0-201904201251.git.148.6de1227.el7.noarch scaleup playbook can finish successfully. PLAY RECAP ******************************************************************************************************************************************************************************************************** ec2-52-194-221-152.ap-northeast-1.compute.amazonaws.com : ok=20 changed=14 unreachable=0 failed=0 localhost : ok=0 changed=0 unreachable=0 failed=0 Sunday 21 April 2019 20:33:46 -0400 (0:00:40.746) 0:05:21.560 **********
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758