Description of problem: Overcloud deployment failed initially a few times due to permission denied issues right after heat stack creation and at the beginning of ansible execution. It occurs sometimes (randomly) after the task to add overcloud nodes keys to undercloud (known-hosts) for heat-admin user. Enabling ssh admin (tripleo-admin) for hosts: [xx,xx,xx,xx,xx,xx,xx,xx,yy,yy]. Using ssh user "heat-admin" for initial connection. Using ssh key at "/home/stack/.ssh/id_rsa_tripleo" for initial connection. Starting ssh admin enablement playbook 2022-03-25 15:13:51.125 143356 INFO tripleoclient.utils.utils [-] Running Ansible playbook: /usr/share/ansible/tripleo-playbooks/cli-enable-ssh-admin.yaml, Working directory: /home/stack/over cloud-deploy/overcloud/cli-enable-ssh-admin, Playbook directory: /usr/share/ansible/tripleo-playbooks 2022-03-25 15:13:51.126 143356 INFO tripleoclient.utils.utils [-] Temporary directory [ /tmp/tripleo0l0ekmx9 ] cleaned up 2022-03-25 15:13:51.127 143356 WARNING tripleoclient.utils.safe_write [-] The output file /home/stack/overcloud-deploy/overcloud/overcloud-deployment_status.yaml will be overriden: Permission Error: [Errno 13] Permission denied: '/home/stack/overcloud-deploy/overcloud/cli-enable-ssh-admin/hosts.yaml' Version-Release number of selected component (if applicable): (undercloud) [stack@undercloud ~]$ sudo rpm -qa |grep -i tripleo-ansib tripleo-ansible-3.3.1-0.20220307013244.130185a.el9ost.noarch (undercloud) [stack@undercloud ~]$ sudo rpm -qa |grep -i python3-triple python3-tripleo-common-15.4.1-0.20220314140831.3db8093.el9ost.noarch python3-tripleoclient-16.4.1-0.20220314170843.423daff.el9ost.noarch How reproducible: Random Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I didn't hit that one during my previous tests - will try to find a way to reproduce it on a more steady way. Question for you, Ketan: is it always a clean deploy, i.e. "brand new UC and OC", or are you re-using an existing undercloud and just iterate overcloud deploys/deletes?
Got some info from IRC: - it's a re-deploy (i.e. UC is created, OC is then deployed, deleted, deployed, ...) - it has nothing to do with ownership: -r--------. 1 stack stack unconfined_u:object_r:user_home_t:s0 67 Mar 9 03:52 hosts.yaml So something is setting a 0400 mode on that file. Let's dig into some code!
*** Bug 2067170 has been marked as a duplicate of this bug. ***
Permission denied error is no longer seen during overcloud deploy.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543