Bug 2069313 - Overcloud deployment fails randomly due to permission denied issues
Summary: Overcloud deployment fails randomly due to permission denied issues
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 2067170 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-28 17:20 UTC by Ketan Mehta
Modified: 2022-09-21 12:20 UTC (History)
10 users (show)

Fixed In Version: python-tripleoclient-16.4.1-0.20220407001042.0021766.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:19:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 835500 0 None NEW Make inventory file writable 2022-03-29 09:20:35 UTC
Red Hat Issue Tracker OSP-14346 0 None None None 2022-03-28 17:36:27 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:20:14 UTC

Description Ketan Mehta 2022-03-28 17:20:41 UTC
Description of problem:

Overcloud deployment failed initially a few times due to permission denied issues right after heat stack creation and at the beginning of ansible execution.

It occurs sometimes (randomly) after the task to add overcloud nodes keys to undercloud (known-hosts) for heat-admin user.

Enabling ssh admin (tripleo-admin) for hosts: [xx,xx,xx,xx,xx,xx,xx,xx,yy,yy].                                                                                                                          
Using ssh user "heat-admin" for initial connection.                                                                                                                                            
Using ssh key at "/home/stack/.ssh/id_rsa_tripleo" for initial connection.                                                                                                                     
                                                                                                                                                                                               
Starting ssh admin enablement playbook                                                                                                                                                         
2022-03-25 15:13:51.125 143356 INFO tripleoclient.utils.utils [-] Running Ansible playbook: /usr/share/ansible/tripleo-playbooks/cli-enable-ssh-admin.yaml, Working directory: /home/stack/over
cloud-deploy/overcloud/cli-enable-ssh-admin, Playbook directory: /usr/share/ansible/tripleo-playbooks                                                                                          
2022-03-25 15:13:51.126 143356 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleo0l0ekmx9 ] cleaned up                                                                      
2022-03-25 15:13:51.127 143356 WARNING tripleoclient.utils.safe_write [-] The output file /home/stack/overcloud-deploy/overcloud/overcloud-deployment_status.yaml will be overriden: Permission
Error: [Errno 13] Permission denied: '/home/stack/overcloud-deploy/overcloud/cli-enable-ssh-admin/hosts.yaml'  

Version-Release number of selected component (if applicable):

(undercloud) [stack@undercloud ~]$ sudo rpm -qa |grep -i tripleo-ansib
tripleo-ansible-3.3.1-0.20220307013244.130185a.el9ost.noarch

(undercloud) [stack@undercloud ~]$ sudo rpm -qa |grep -i python3-triple
python3-tripleo-common-15.4.1-0.20220314140831.3db8093.el9ost.noarch
python3-tripleoclient-16.4.1-0.20220314170843.423daff.el9ost.noarch

How reproducible:

Random

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Cédric Jeanneret 2022-03-29 08:28:11 UTC
I didn't hit that one during my previous tests - will try to find a way to reproduce it on a more steady way.

Question for you, Ketan: is it always a clean deploy, i.e. "brand new UC and OC", or are you re-using an existing undercloud and just iterate overcloud deploys/deletes?

Comment 2 Cédric Jeanneret 2022-03-29 09:14:20 UTC
Got some info from IRC:

- it's a re-deploy (i.e. UC is created, OC is then deployed, deleted, deployed, ...)
- it has nothing to do with ownership: -r--------. 1 stack stack unconfined_u:object_r:user_home_t:s0   67 Mar  9 03:52 hosts.yaml

So something is setting a 0400 mode on that file. Let's dig into some code!

Comment 3 Brendan Shephard 2022-03-29 09:52:45 UTC
*** Bug 2067170 has been marked as a duplicate of this bug. ***

Comment 5 David Rosenfeld 2022-06-17 12:26:38 UTC
Permission denied error is no longer seen during overcloud deploy.

Comment 10 errata-xmlrpc 2022-09-21 12:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.