Bug 2049393
| Summary: | Overcloud deployment continues without external tasks if undercloud is "unreachable" | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Takashi Kajinami <tkajinam> |
| Component: | openstack-tripleo-common | Assignee: | Cédric Jeanneret <cjeanner> |
| Status: | CLOSED ERRATA | QA Contact: | Joe H. Rahme <jhakimra> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | astupnik, bdobreli, cjeanner, mburns, slinaber |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-common-11.7.1-2.20220318011205.b5ef9a5.el8ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-06-22 16:03:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Takashi Kajinami
2022-02-02 05:57:47 UTC
Hello Takashi, I guess this issue also exists in 16.1? Would you be able to confirm it? Cheers, C. This is likely 16.2 only because that is where we implemented the partial failure logic. It's supposed to stop if any playbook fails so this flow seems weird. We used to ssh to the undercloud to break out of the mistral container so I wonder if the solution would be to switch to a local connection now that we're not in containers anymore (for now). We have never seen this issue in RHOSP16.1 and as Alex mentioned it is likely to be specific to RHOSP16.1 . We expected the task to gather facts would hard fail but it's not failing actually. Switching to local connection is one option, to avoid any issue caused by ssh, but I'm afraid it doesn't work for in OSP16.2 which use ssh from mistral containers. If gather facts task doesn't fail then we might need a dummy task to ensure ssh to undercloud works at the very beginning. Hello Takashi, Is there a way to reproduce this issue? Having a reproducer would be good - but I suspect it's one of those weird cases hitting randomly. Cheers, C. Unfortunately I've not yet established the reproducer and the problem was resolved once I execute ssh command from mistral container(or even that might be unnecessary). The one thing we can try is to move /home/tripleo-admin/.ssh/id_rsa so that ssh using the key fails. Good news!
- the patch is working just fine, we need to merged it and run the backport dance
- the way to verify is pretty easy in the end
Verification steps:
- get an Undercloud
- *before* starting the OC deploy, remove /home/tripleo-admin/.ssh/authorized_keys on the Undercloud
- start the deploy
- you should get the following log once ansible kicks in:
PLAY [Clear cached facts] ******************************************************
PLAY [Gather facts from undercloud] ********************************************
2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 | TASK | Gathering Facts
2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud
2022-02-14 14:33:37.715755 | 2442014f-b7ee-7295-22dd-0000000000f5 | TIMING | Gathering Facts | undercloud | 0:01:57.350324 | 117.27s
NO MORE HOSTS LEFT *************************************************************
PLAY RECAP *********************************************************************
undercloud : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37.717939 | UUID | Info | Host | Task Name | Run Time
2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 | SUMMARY | undercloud | Gathering Facts | 117.27s
2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37.718431 | The following node(s) had failures: undercloud
2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ansible failed, check log at /var/lib/mistral/overcloud-0/ansible.log.Overcloud Endpoint: http://192.168.100.85:5000
Overcloud Horizon Dashboard URL: http://192.168.100.85:80/dashboard
Overcloud rc file: /home/stack/overcloud-0rc
Overcloud Deployed with error
You can also have a look at mistral logs here: /var/lib/mistral/overcloud-0/ansible.log
[RedHat-8.4 - root@undercloud ~]# cat /var/lib/mistral/overcloud-0/ansible.log
2022-02-14 14:31:39,824 p=452 u=mistral n=ansible | [WARNING]: Invalid characters were found in group names but not replaced, use
-vvvv to see details
2022-02-14 14:31:39,825 p=452 u=mistral n=ansible | [WARNING]: Skipping key (deprecated) in group (overcloud) as it is not a
mapping, it is a <class 'ansible.parsing.yaml.objects.AnsibleUnicode'>
2022-02-14 14:31:40,367 p=452 u=mistral n=ansible | PLAY [Clear cached facts] ******************************************************
2022-02-14 14:31:40,442 p=452 u=mistral n=ansible | PLAY [Gather facts from undercloud] ********************************************
2022-02-14 14:31:40,448 p=452 u=mistral n=ansible | 2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 | TASK | Gathering Facts
2022-02-14 14:33:37,715 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud
2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************
2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | PLAY RECAP *********************************************************************
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | undercloud : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717939 | UUID | Info | Host | Task Name | Run Time
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 | SUMMARY | undercloud | Gathering Facts | 117.27s
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718431 | The following node(s) had failures: undercloud
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:4793 |