Bug 2049393

Summary: Overcloud deployment continues without external tasks if undercloud is "unreachable"
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: openstack-tripleo-commonAssignee: Cédric Jeanneret <cjeanner>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: astupnik, bdobreli, cjeanner, mburns, slinaber
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-11.7.1-2.20220318011205.b5ef9a5.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-22 16:03:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2022-02-02 05:57:47 UTC
Description of problem:

When there is an issue with sshing undercloud from undercloud, ansible playbook start ignoring tasks on undercloud.
Because we run external deploy tasks from undercloud, this results in incomplete settings. Actually in our case deployment failed at starting containers in step 4, because tasks to create keystone resources are not invoked.


Version-Release number of selected component (if applicable):
16.2.1

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
Deployment fails at early stage because of unreachable undercloud

Expected results:
Deployment continues with a error during configurations, which don't look related to undercloud unreachability.


Additional info:

Comment 2 Cédric Jeanneret 2022-02-02 09:54:12 UTC
Hello Takashi,

I guess this issue also exists in 16.1? Would you be able to confirm it?

Cheers,

C.

Comment 3 Alex Schultz 2022-02-02 15:12:19 UTC
This is likely 16.2 only because that is where we implemented the partial failure logic.  It's supposed to stop if any playbook fails so this flow seems weird.  We used to ssh to the undercloud to break out of the mistral container so I wonder if the solution would be to switch to a local connection now that we're not in containers anymore (for now).

Comment 6 Takashi Kajinami 2022-02-10 06:18:02 UTC
We have never seen this issue in RHOSP16.1 and as Alex mentioned it is likely to be specific to RHOSP16.1 .

We expected the task to gather facts would hard fail but it's not failing actually.
Switching to local connection is one option, to avoid any issue caused by ssh, but
I'm afraid it doesn't work for in OSP16.2 which use ssh from mistral containers.

If gather facts task doesn't fail then we might need a dummy task to ensure
ssh to undercloud works at the very beginning.

Comment 7 Cédric Jeanneret 2022-02-10 13:51:08 UTC
Hello Takashi,

Is there a way to reproduce this issue? Having a reproducer would be good - but I suspect it's one of those weird cases hitting randomly.

Cheers,

C.

Comment 8 Takashi Kajinami 2022-02-10 14:24:52 UTC
Unfortunately I've not yet established the reproducer and the problem was resolved once I execute ssh command from mistral container(or even that might be unnecessary).

The one thing we can try is to move /home/tripleo-admin/.ssh/id_rsa so that ssh using the key fails.

Comment 9 Cédric Jeanneret 2022-02-14 14:56:17 UTC
Good news!

- the patch is working just fine, we need to merged it and run the backport dance
- the way to verify is pretty easy in the end

Verification steps:
- get an Undercloud
- *before* starting the OC deploy, remove /home/tripleo-admin/.ssh/authorized_keys on the Undercloud
- start the deploy
- you should get the following log once ansible kicks in:

PLAY [Clear cached facts] ******************************************************                                                                                                                                                              
                                                                                                                                                                                                                                              
PLAY [Gather facts from undercloud] ********************************************                                                                                                                                                              
2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 |       TASK | Gathering Facts                                                                                                                                              
2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud                                                                                                                                
2022-02-14 14:33:37.715755 | 2442014f-b7ee-7295-22dd-0000000000f5 |     TIMING | Gathering Facts | undercloud | 0:01:57.350324 | 117.27s                                                                                                      
                                                                                                                                                                                                                                              
NO MORE HOSTS LEFT *************************************************************                                                                                                                                                              
                                                                                                                                                                                                                                              
PLAY RECAP *********************************************************************                                                                                                                                                              
undercloud                 : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0                                                                                                                            
                                                                                                                                                                                                                                              
2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.717939 |                                 UUID |       Info |       Host |   Task Name |   Run Time                                                                                                                        
2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 |    SUMMARY | undercloud | Gathering Facts | 117.27s                                                                                                                       
2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~                                                                                                                        
2022-02-14 14:33:37.718431 |  The following node(s) had failures: undercloud                                                                                                                                                                  
2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                                                                        
Ansible failed, check log at /var/lib/mistral/overcloud-0/ansible.log.Overcloud Endpoint: http://192.168.100.85:5000                                                                                                                          
Overcloud Horizon Dashboard URL: http://192.168.100.85:80/dashboard                                                                                                                                                                           
Overcloud rc file: /home/stack/overcloud-0rc                                                                                                                                                                                                  
Overcloud Deployed with error

You can also have a look at mistral logs here: /var/lib/mistral/overcloud-0/ansible.log
[RedHat-8.4 - root@undercloud ~]# cat /var/lib/mistral/overcloud-0/ansible.log
2022-02-14 14:31:39,824 p=452 u=mistral n=ansible | [WARNING]: Invalid characters were found in group names but not replaced, use                                                                                                             
-vvvv to see details

2022-02-14 14:31:39,825 p=452 u=mistral n=ansible | [WARNING]: Skipping key (deprecated) in group (overcloud) as it is not a                                                                                                                  
mapping, it is a <class 'ansible.parsing.yaml.objects.AnsibleUnicode'>

2022-02-14 14:31:40,367 p=452 u=mistral n=ansible | PLAY [Clear cached facts] ******************************************************                                                                                                          
2022-02-14 14:31:40,442 p=452 u=mistral n=ansible | PLAY [Gather facts from undercloud] ********************************************                                                                                                          
2022-02-14 14:31:40,448 p=452 u=mistral n=ansible | 2022-02-14 14:31:40.448375 | 2442014f-b7ee-7295-22dd-0000000000f5 |       TASK | Gathering Facts                                                                                          
2022-02-14 14:33:37,715 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.714962 | 2442014f-b7ee-7295-22dd-0000000000f5 | UNREACHABLE | Gathering Facts | undercloud                                                                            
2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | NO MORE HOSTS LEFT *************************************************************                                                                                                          
2022-02-14 14:33:37,716 p=452 u=mistral n=ansible | PLAY RECAP *********************************************************************                                                                                                          
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | undercloud                 : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0                                                                        
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717653 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717776 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717862 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:01:57.352443 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,717 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.717939 |                                 UUID |       Info |       Host |   Task Name |   Run Time                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718015 | 2442014f-b7ee-7295-22dd-0000000000f5 |    SUMMARY | undercloud | Gathering Facts | 117.27s                                                                   
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718273 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718363 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~                                                                    
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718431 |  The following node(s) had failures: undercloud                                                                                                              
2022-02-14 14:33:37,718 p=452 u=mistral n=ansible | 2022-02-14 14:33:37.718500 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Comment 17 errata-xmlrpc 2022-06-22 16:03:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4793