Bug 2049393 - Overcloud deployment continues without external tasks if undercloud is "unreachable"
Summary: Overcloud deployment continues without external tasks if undercloud is "unrea...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Cédric Jeanneret
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-02-02 05:57 UTC by Takashi Kajinami
Modified: 2022-06-22 16:04 UTC (History)
5 users (show)

Fixed In Version: openstack-tripleo-common-11.7.1-2.20220318011205.b5ef9a5.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-22 16:03:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1960518 0 None None None 2022-02-10 14:41:14 UTC
OpenStack gerrit 828721 0 None NEW Ensure failures on the undercloud leads to a complete stop 2022-02-10 14:42:06 UTC
Red Hat Issue Tracker OSP-12425 0 None None None 2022-02-02 06:00:17 UTC
Red Hat Product Errata RHBA-2022:4793 0 None None None 2022-06-22 16:04:15 UTC

Description Takashi Kajinami 2022-02-02 05:57:47 UTC
Description of problem:

When there is an issue with sshing undercloud from undercloud, ansible playbook start ignoring tasks on undercloud.
Because we run external deploy tasks from undercloud, this results in incomplete settings. Actually in our case deployment failed at starting containers in step 4, because tasks to create keystone resources are not invoked.


Version-Release number of selected component (if applicable):
16.2.1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
Deployment fails at early stage because of unreachable undercloud

Expected results:
Deployment continues with a error during configurations, which don't look related to undercloud unreachability.


Additional info:

Comment 2 Cédric Jeanneret 2022-02-02 09:54:12 UTC
Hello Takashi,

I guess this issue also exists in 16.1? Would you be able to confirm it?

Cheers,

C.

Comment 3 Alex Schultz 2022-02-02 15:12:19 UTC
This is likely 16.2 only because that is where we implemented the partial failure logic.  It's supposed to stop if any playbook fails so this flow seems weird.  We used to ssh to the undercloud to break out of the mistral container so I wonder if the solution would be to switch to a local connection now that we're not in containers anymore (for now).

Comment 6 Takashi Kajinami 2022-02-10 06:18:02 UTC
We have never seen this issue in RHOSP16.1 and as Alex mentioned it is likely to be specific to RHOSP16.1 .

We expected the task to gather facts would hard fail but it's not failing actually.
Switching to local connection is one option, to avoid any issue caused by ssh, but
I'm afraid it doesn't work for in OSP16.2 which use ssh from mistral containers.

If gather facts task doesn't fail then we might need a dummy task to ensure
ssh to undercloud works at the very beginning.

Comment 7 Cédric Jeanneret 2022-02-10 13:51:08 UTC
Hello Takashi,

Is there a way to reproduce this issue? Having a reproducer would be good - but I suspect it's one of those weird cases hitting randomly.

Cheers,

C.

Comment 8 Takashi Kajinami 2022-02-10 14:24:52 UTC
Unfortunately I've not yet established the reproducer and the problem was resolved once I execute ssh command from mistral container(or even that might be unnecessary).

The one thing we can try is to move /home/tripleo-admin/.ssh/id_rsa so that ssh using the key fails.

Comment 9 Cédric Jeanneret 2022-02-14 14:56:17 UTC
Good news!

- the patch is working just fine, we need to merged it and run the backport dance
- the way to verify is pretty easy in the end

Verification steps:
- get an Undercloud
- *before* starting the OC deploy, remove /home/tripleo-admin/.ssh/authorized_keys on the Undercloud
- start the deploy
- you should get the following log once ansible kicks in:

PLAY [Clear cached facts] ******************************************************                                                                                                                                                              
                                                                                                                                                                                                                                              
PLAY [Gather facts from undercloud] ********************************************                                                                                                                                                              
2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 |       TASK | Gathering Facts                                                                                                                                              
2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud                                                                                                                                
2022-02-14 14:33:37.715755 | 2442014f-b7ee-7295-22dd-0000000000f5 |     TIMING | Gathering Facts | undercloud | 0:01:57.350324 | 117.27s                                                                                                      
                                                                                                                                                                                                                                              
NO MORE HOSTS LEFT *************************************************************                                                                                                                                                              
                                                                                                                                                                                                                                              
PLAY RECAP *********************************************************************                                                                                                                                                              
undercloud                 : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0                                                                                                                            
                                                                                                                                                                                                                                              
2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.717939 |                                 UUID |       Info |       Host |   Task Name |   Run Time                                                                                                                        
2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 |    SUMMARY | undercloud | Gathering Facts | 117.27s                                                                                                                       
2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.718431 |  The following node(s) had failures: undercloud                                                                                                                                                                  
2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
Ansible failed, check log at /var/lib/mistral/overcloud-0/ansible.log.Overcloud Endpoint: http://192.168.100.85:5000                                                                                                                          
Overcloud Horizon Dashboard URL: http://192.168.100.85:80/dashboard                                                                                                                                                                           
Overcloud rc file: /home/stack/overcloud-0rc                                                                                                                                                                                                  
Overcloud Deployed with error

You can also have a look at mistral logs here: /var/lib/mistral/overcloud-0/ansible.log
[RedHat-8.4 - root@undercloud ~]# cat /var/lib/mistral/overcloud-0/ansible.log
2022-02-14 14:31:39,824 p=452 u=mistral n=ansible | [WARNING]: Invalid characters were found in group names but not replaced, use                                                                                                             
-vvvv to see details

2022-02-14 14:31:39,825 p=452 u=mistral n=ansible | [WARNING]: Skipping key (deprecated) in group (overcloud) as it is not a                                                                                                                  
mapping, it is a <class 'ansible.parsing.yaml.objects.AnsibleUnicode'>

2022-02-14 14:31:40,367 p=452 u=mistral n=ansible | PLAY [Clear cached facts] ******************************************************                                                                                                          
2022-02-14 14:31:40,442 p=452 u=mistral n=ansible | PLAY [Gather facts from undercloud] ********************************************                                                                                                          
2022-02-14 14:31:40,448 p=452 u=mistral n=ansible | 2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 |       TASK | Gathering Facts                                                                                          
2022-02-14 14:33:37,715 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud                                                                            
2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************                                                                                                          
2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | PLAY RECAP *********************************************************************                                                                                                          
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | undercloud                 : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0                                                                        
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717939 |                                 UUID |       Info |       Host |   Task Name |   Run Time                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 |    SUMMARY | undercloud | Gathering Facts | 117.27s                                                                   
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718431 |  The following node(s) had failures: undercloud                                                                                                              
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment 17 errata-xmlrpc 2022-06-22 16:03:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4793


Note You need to log in before you can comment on or make changes to this bug.