Bug 1698814 - Scale up nodes failed at task [Apply ignition manifest]
Summary: Scale up nodes failed at task [Apply ignition manifest]
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Russell Teague
QA Contact: Weihua Meng
Depends On:
TreeView+ depends on / blocked
Reported: 2019-04-11 09:05 UTC by Weihua Meng
Modified: 2019-06-04 10:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2019-06-04 10:47:22 UTC
Target Upstream Version:

Attachments (Terms of Use)
ec2-console-log (62.99 KB, text/plain)
2019-04-16 08:03 UTC, Weihua Meng
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:47:30 UTC

Description Weihua Meng 2019-04-11 09:05:39 UTC
Description of problem:
Scale up nodes failed at task [Apply ignition manifest]

Version-Release number of the following components:

ansible 2.7.9
  config file = /home/wmeng/openshift/openshift-ansible/ansible.cfg
  configured module search path = ['/home/wmeng/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.6 (default, Mar 29 2019, 00:03:27) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]


How reproducible:

Steps to Reproduce:
following https://github.com/openshift/openshift-ansible/blob/master/README.md
1. set up OCP4 cluster
2. prepare new_worker vms
3. run scaleup playbook
$ansible-playbook -i inventory/hosts playbooks/scaleup.yml

Actual results:
failed due to host UNREACHABLE

<ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o 'IdentityFile="/home/wmeng/shared-secrets/aws/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=ec2-user -o ConnectTimeout=30 -o ControlPath=/home/wmeng/.ansible/cp/%h-%r -tt ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-pwwpajzkopvjdnttsmmyispqpjrskwdv; /usr/bin/python /home/ec2-user/.ansible/tmp/ansible-tmp-1554971512.1921854-243809547831033/async_wrapper.py 339131747332 900 /home/ec2-user/.ansible/tmp/ansible-tmp-1554971512.1921854-243809547831033/AnsiballZ_command.py _'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded
<ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com> (255, b'{"started": 1, "_ansible_suppress_tmpdir_delete": true, "finished": 0, "results_file": "/root/.ansible_async/339131747332.4000", "ansible_job_id": "339131747332.4000"}\r\n', b'Shared connection to ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com closed.\r\n')
fatal: [ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: Shared connection to ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com closed.",
    "unreachable": true

PLAY RECAP ********************************************************************************************************************************************************************************************************
ec2-18-182-49-196.ap-northeast-1.compute.amazonaws.com : ok=18   changed=7    unreachable=1    failed=0   
ec2-54-199-196-17.ap-northeast-1.compute.amazonaws.com : ok=18   changed=7    unreachable=1    failed=0   
ec2-54-95-137-65.ap-northeast-1.compute.amazonaws.com : ok=18   changed=7    unreachable=1    failed=0   
localhost                  : ok=0    changed=0    unreachable=0    failed=0   

Thursday 11 April 2019  04:39:00 -0400 (0:07:08.795)       0:08:59.703 ******** 
openshift_node : Apply ignition manifest ----------------------------------------------------------------------------------------------------------------------------------------------------------------- 428.80s
/home/wmeng/openshift/openshift-ansible/roles/openshift_node/tasks/config.yml:78 ---------------------------------------------------------------------------------------------------------------------------------

Expected results:
ansible playbook complete successfully.

Comment 1 Russell Teague 2019-04-11 12:35:09 UTC
Please attach full ansible -vvv logs.

Please ensure openshift-ansible/ansible.cfg is being used during playbook execution.  The task which failed could be related to not having ssh retries configured.


Comment 3 Russell Teague 2019-04-12 12:20:48 UTC
Logs show task [Apply ignition manifest] was tried 15 times and failed after 7 minutes.

Comment 4 Vadim Rutkovsky 2019-04-12 12:31:30 UTC
It appears the node was rebooted, but it doesn't come up after it. Not much Ansible can do about this.

Is this reproducible? Is there a console log from the node when task failed?

Comment 8 Weihua Meng 2019-04-16 08:03:37 UTC
Created attachment 1555403 [details]

Comment 9 Weihua Meng 2019-04-16 08:04:59 UTC
(In reply to Vadim Rutkovsky from comment #4)
> It appears the node was rebooted, but it doesn't come up after it. Not much
> Ansible can do about this.
> Is this reproducible? Is there a console log from the node when task failed?

It can be reproduced.

console log attatched.

Comment 10 Russell Teague 2019-04-17 12:36:27 UTC
Changes in the way reboots are handled during apply for ignition manifest have merged.

Please test again.

Comment 12 Weihua Meng 2019-04-22 00:39:21 UTC


scaleup playbook can finish successfully.

PLAY RECAP ********************************************************************************************************************************************************************************************************
ec2-52-194-221-152.ap-northeast-1.compute.amazonaws.com : ok=20   changed=14   unreachable=0    failed=0   
localhost                  : ok=0    changed=0    unreachable=0    failed=0   

Sunday 21 April 2019  20:33:46 -0400 (0:00:40.746)       0:05:21.560 **********

Comment 14 errata-xmlrpc 2019-06-04 10:47:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.