In OSP16.1 compose RHOS-16.1-RHEL-8-20200623.n.0 with ceph-ansible-4.0.23-1.el8cp.noarch Overcloud deployment failed on ceph-ansible step. From undercloud-0/var/lib/mistral/overcloud/ansible.log.txt.gz: > "TASK [ceph-config : generate ceph.conf configuration file] *********************", > "task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:159", > "Tuesday 23 June 2020 10:58:43 +0000 (0:00:00.261) 0:01:29.567 ********** ", > "fatal: [controller-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \\\"/tmp\\\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \\\"` echo /tmp/ceph_ansible_tmp `\\\"&& mkdir /tmp/ceph_ansible_tmp/ansible-tmp-1592909923.3014195-164729-113154489071080 && echo ansible-tmp-1592909923.3014195-164729-113154489071080=\\\"` echo /tmp/ceph_ansible_tmp/ansible-tmp-1592909923.3014195-164729-113154489071080 `\\\" ), exited with result 1\", \"unreachable\": true}", > "NO MORE HOSTS LEFT *************************************************************", > "PLAY RECAP *********************************************************************", > "ceph-0 : ok=60 changed=3 unreachable=0 failed=0 skipped=143 rescued=0 ignored=0 ", > "compute-0 : ok=48 changed=3 unreachable=0 failed=0 skipped=143 rescued=0 ignored=0 ", > "controller-0 : ok=93 changed=6 unreachable=1 failed=0 skipped=214 rescued=0 ignored=0 ", > "INSTALLER STATUS ***************************************************************", > "Install Ceph Monitor : In Progress (0:00:12)", > "\tThis phase can be restarted by running: roles/ceph-mon/tasks/main.yml", > "Tuesday 23 June 2020 10:58:43 +0000 (0:00:00.072) 0:01:29.640 ********** ", > "=============================================================================== ", Seems that directory /tmp/ceph_ansible_tmp may have been created before as different user or with wrong permission. As on controller-0 into which it tried to copy the task, there is warning about it visible. controller-0/var/log/messages.txt.gz: > Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: Invoked with path=/etc/tmpfiles.d/ceph-common.conf line=d /run/ceph 0770 root root - owner=root group=root mode=420 state=present create=True backrefs=False backup=False firstmatch=False follow=False regexp=None insertafter=None insertbefore=None validate=None seuser=None serole=None selevel=None setype=None attributes=None src=None force=None content=NOT_LOGGING_PARAMETER remote_src=None delimiter=None directory_mode=None unsafe_writes=None > Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually
Created attachment 1698475 [details] ansible.log Full ansible.log of the first failure
(In reply to Pavel Sedlák from comment #0) > Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually - This was the first build with ansible 2.9.10 instead of 2.9.9 (it was changed last night) - 2.9.10 includes [1], i.e. this change https://github.com/ansible/ansible/issues/68218 - ceph-ansible and tripleo-ansible have not changed - an env 2.9.9 doesn't reproduce this problem [1] https://github.com/ansible/ansible/commit/60275fd9b4db4362f435a68590264963f5a494c8#diff-4b131dc1948ab542ad6aa59bc509cb52R88
As per this error: Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually Ansible is suggesting I "create the remote_tmp dir with the correct permissions". This is because of a change introduced in ansible 2.9.10 https://github.com/ansible/ansible/issues/68218 My plan is to patch tripleo-ansible to do what they recommend. In other words: Have tripleo-ansible create the remote_tmp dir with the correct permissions before it runs ceph-ansible (which uses that remote_tmp dir).
Created attachment 1698633 [details] ceph-ansible execution with the upstream patch
Here is my assessment of what's happening: In Ansible <= 2.9.9, there was a bug in `lineinfile` where it did not honor `ANSIBLE_REMOTE_TMP`, so the `lineinfile` task was _not_ creating `/tmp/ceph_ansible_tmp` and was most likely using `/tmp`. Since that bug was fixed, `/tmp/ceph_ansible_tmp` is being created by the `lineinfile` task and owned by `root` with `700` permissions. Later in `ceph-ansible`, because `ANSIBLE_REMOTE_TEMP` is set to the same directory, it tries to use it but cannot because that task is not being run as `root`. This can be solved by: - explicitly creating `/tmp/ceph_ansible_tmp` with appropriate permissions - using different remote tmp directories for whatever is running that `lineinfile` task and the `ceph-ansible` run
(In reply to Sam Doran from comment #17) > In Ansible <= 2.9.9, there was a bug in `lineinfile` where it did not honor `ANSIBLE_REMOTE_TMP` Thanks for the explanation. > This can be solved by: > - explicitly creating `/tmp/ceph_ansible_tmp` with appropriate permissions The fixing patch did exactly that.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148