Bug 2069313

Summary: Overcloud deployment fails randomly due to permission denied issues
Product: Red Hat OpenStack Reporter: Ketan Mehta <kmehta>
Component: python-tripleoclientAssignee: OSP Team <rhos-maint>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 17.0 (Wallaby)CC: cjeanner, drosenfe, hbrock, jschluet, jslagle, lnatapov, mburns, mkrcmari, ramishra, supadhya
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: python-tripleoclient-16.4.1-0.20220407001042.0021766.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 12:19:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ketan Mehta 2022-03-28 17:20:41 UTC
Description of problem:

Overcloud deployment failed initially a few times due to permission denied issues right after heat stack creation and at the beginning of ansible execution.

It occurs sometimes (randomly) after the task to add overcloud nodes keys to undercloud (known-hosts) for heat-admin user.

Enabling ssh admin (tripleo-admin) for hosts: [xx,xx,xx,xx,xx,xx,xx,xx,yy,yy].                                                                                                                          
Using ssh user "heat-admin" for initial connection.                                                                                                                                            
Using ssh key at "/home/stack/.ssh/id_rsa_tripleo" for initial connection.                                                                                                                     
                                                                                                                                                                                               
Starting ssh admin enablement playbook                                                                                                                                                         
2022-03-25 15:13:51.125 143356 INFO tripleoclient.utils.utils [-] Running Ansible playbook: /usr/share/ansible/tripleo-playbooks/cli-enable-ssh-admin.yaml, Working directory: /home/stack/over
cloud-deploy/overcloud/cli-enable-ssh-admin, Playbook directory: /usr/share/ansible/tripleo-playbooks                                                                                          
2022-03-25 15:13:51.126 143356 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleo0l0ekmx9 ] cleaned up                                                                      
2022-03-25 15:13:51.127 143356 WARNING tripleoclient.utils.safe_write [-] The output file /home/stack/overcloud-deploy/overcloud/overcloud-deployment_status.yaml will be overriden: Permission
Error: [Errno 13] Permission denied: '/home/stack/overcloud-deploy/overcloud/cli-enable-ssh-admin/hosts.yaml'  

Version-Release number of selected component (if applicable):

(undercloud) [stack@undercloud ~]$ sudo rpm -qa |grep -i tripleo-ansib
tripleo-ansible-3.3.1-0.20220307013244.130185a.el9ost.noarch

(undercloud) [stack@undercloud ~]$ sudo rpm -qa |grep -i python3-triple
python3-tripleo-common-15.4.1-0.20220314140831.3db8093.el9ost.noarch
python3-tripleoclient-16.4.1-0.20220314170843.423daff.el9ost.noarch

How reproducible:

Random

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Cédric Jeanneret 2022-03-29 08:28:11 UTC
I didn't hit that one during my previous tests - will try to find a way to reproduce it on a more steady way.

Question for you, Ketan: is it always a clean deploy, i.e. "brand new UC and OC", or are you re-using an existing undercloud and just iterate overcloud deploys/deletes?

Comment 2 Cédric Jeanneret 2022-03-29 09:14:20 UTC
Got some info from IRC:

- it's a re-deploy (i.e. UC is created, OC is then deployed, deleted, deployed, ...)
- it has nothing to do with ownership: -r--------. 1 stack stack unconfined_u:object_r:user_home_t:s0   67 Mar  9 03:52 hosts.yaml

So something is setting a 0400 mode on that file. Let's dig into some code!

Comment 3 Brendan Shephard 2022-03-29 09:52:45 UTC
*** Bug 2067170 has been marked as a duplicate of this bug. ***

Comment 5 David Rosenfeld 2022-06-17 12:26:38 UTC
Permission denied error is no longer seen during overcloud deploy.

Comment 10 errata-xmlrpc 2022-09-21 12:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543