Description of problem: When there is an issue with sshing undercloud from undercloud, ansible playbook start ignoring tasks on undercloud. Because we run external deploy tasks from undercloud, this results in incomplete settings. Actually in our case deployment failed at starting containers in step 4, because tasks to create keystone resources are not invoked. Version-Release number of selected component (if applicable): 16.2.1 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Deployment fails at early stage because of unreachable undercloud Expected results: Deployment continues with a error during configurations, which don't look related to undercloud unreachability. Additional info:
Hello Takashi, I guess this issue also exists in 16.1? Would you be able to confirm it? Cheers, C.
This is likely 16.2 only because that is where we implemented the partial failure logic. It's supposed to stop if any playbook fails so this flow seems weird. We used to ssh to the undercloud to break out of the mistral container so I wonder if the solution would be to switch to a local connection now that we're not in containers anymore (for now).
We have never seen this issue in RHOSP16.1 and as Alex mentioned it is likely to be specific to RHOSP16.1 . We expected the task to gather facts would hard fail but it's not failing actually. Switching to local connection is one option, to avoid any issue caused by ssh, but I'm afraid it doesn't work for in OSP16.2 which use ssh from mistral containers. If gather facts task doesn't fail then we might need a dummy task to ensure ssh to undercloud works at the very beginning.
Hello Takashi, Is there a way to reproduce this issue? Having a reproducer would be good - but I suspect it's one of those weird cases hitting randomly. Cheers, C.
Unfortunately I've not yet established the reproducer and the problem was resolved once I execute ssh command from mistral container(or even that might be unnecessary). The one thing we can try is to move /home/tripleo-admin/.ssh/id_rsa so that ssh using the key fails.
Good news! - the patch is working just fine, we need to merged it and run the backport dance - the way to verify is pretty easy in the end Verification steps: - get an Undercloud - *before* starting the OC deploy, remove /home/tripleo-admin/.ssh/authorized_keys on the Undercloud - start the deploy - you should get the following log once ansible kicks in: PLAY [Clear cached facts] ****************************************************** PLAY [Gather facts from undercloud] ******************************************** 2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 | TASK | Gathering Facts 2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud 2022-02-14 14:33:37.715755 | 2442014f-b7ee-7295-22dd-0000000000f5 | TIMING | Gathering Facts | undercloud | 0:01:57.350324 | 117.27s NO MORE HOSTS LEFT ************************************************************* PLAY RECAP ********************************************************************* undercloud : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37.717939 | UUID | Info | Host | Task Name | Run Time 2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 | SUMMARY | undercloud | Gathering Facts | 117.27s 2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37.718431 | The following node(s) had failures: undercloud 2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ansible failed, check log at /var/lib/mistral/overcloud-0/ansible.log.Overcloud Endpoint: http://192.168.100.85:5000 Overcloud Horizon Dashboard URL: http://192.168.100.85:80/dashboard Overcloud rc file: /home/stack/overcloud-0rc Overcloud Deployed with error You can also have a look at mistral logs here: /var/lib/mistral/overcloud-0/ansible.log [RedHat-8.4 - root@undercloud ~]# cat /var/lib/mistral/overcloud-0/ansible.log 2022-02-14 14:31:39,824 p=452 u=mistral n=ansible | [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details 2022-02-14 14:31:39,825 p=452 u=mistral n=ansible | [WARNING]: Skipping key (deprecated) in group (overcloud) as it is not a mapping, it is a <class 'ansible.parsing.yaml.objects.AnsibleUnicode'> 2022-02-14 14:31:40,367 p=452 u=mistral n=ansible | PLAY [Clear cached facts] ****************************************************** 2022-02-14 14:31:40,442 p=452 u=mistral n=ansible | PLAY [Gather facts from undercloud] ******************************************** 2022-02-14 14:31:40,448 p=452 u=mistral n=ansible | 2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 | TASK | Gathering Facts 2022-02-14 14:33:37,715 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud 2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | NO MORE HOSTS LEFT ************************************************************* 2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | PLAY RECAP ********************************************************************* 2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | undercloud : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717939 | UUID | Info | Host | Task Name | Run Time 2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 | SUMMARY | undercloud | Gathering Facts | 117.27s 2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~ 2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718431 | The following node(s) had failures: undercloud 2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:4793