Bug 1850059 - OC deploy fails on ceph-ansible generate ceph.conf - Failed to create temporary directory.
Summary: OC deploy fails on ceph-ansible generate ceph.conf - Failed to create tempora...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 16.1 (Train on RHEL 8.2)
Assignee: John Fulton
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1850978
TreeView+ depends on / blocked
 
Reported: 2020-06-23 13:34 UTC by Pavel Sedlák
Modified: 2020-07-29 07:54 UTC (History)
6 users (show)

Fixed In Version: tripleo-ansible-0.5.1-0.20200611113655.34b8fcc.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1850978 1851190 (view as bug list)
Environment:
Last Closed: 2020-07-29 07:53:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ansible.log (1.40 MB, application/gzip)
2020-06-23 15:09 UTC, Francesco Pantano
no flags Details
ceph-ansible execution with the upstream patch (2.58 MB, text/plain)
2020-06-24 15:46 UTC, Francesco Pantano
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1884816 0 None None None 2020-06-23 17:56:09 UTC
OpenStack gerrit 737660 0 None MERGED Make tripleo_ceph_run_ansible handle change in remote_tmp 2020-12-10 14:35:23 UTC
OpenStack gerrit 737761 0 None MERGED Make tripleo_ceph_run_ansible handle change in remote_tmp 2020-12-10 14:34:55 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:54:39 UTC

Description Pavel Sedlák 2020-06-23 13:34:30 UTC
In OSP16.1 compose RHOS-16.1-RHEL-8-20200623.n.0
with ceph-ansible-4.0.23-1.el8cp.noarch
Overcloud deployment failed on ceph-ansible step.

From undercloud-0/var/lib/mistral/overcloud/ansible.log.txt.gz:
> "TASK [ceph-config : generate ceph.conf configuration file] *********************",
> "task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:159",
> "Tuesday 23 June 2020  10:58:43 +0000 (0:00:00.261)       0:01:29.567 ********** ",
> "fatal: [controller-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \\\"/tmp\\\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \\\"` echo /tmp/ceph_ansible_tmp `\\\"&& mkdir /tmp/ceph_ansible_tmp/ansible-tmp-1592909923.3014195-164729-113154489071080 && echo ansible-tmp-1592909923.3014195-164729-113154489071080=\\\"` echo /tmp/ceph_ansible_tmp/ansible-tmp-1592909923.3014195-164729-113154489071080 `\\\" ), exited with result 1\", \"unreachable\": true}",
> "NO MORE HOSTS LEFT *************************************************************",
> "PLAY RECAP *********************************************************************",
> "ceph-0                     : ok=60   changed=3    unreachable=0    failed=0    skipped=143  rescued=0    ignored=0   ",
> "compute-0                  : ok=48   changed=3    unreachable=0    failed=0    skipped=143  rescued=0    ignored=0   ",
> "controller-0               : ok=93   changed=6    unreachable=1    failed=0    skipped=214  rescued=0    ignored=0   ",
> "INSTALLER STATUS ***************************************************************",
> "Install Ceph Monitor           : In Progress (0:00:12)",
> "\tThis phase can be restarted by running: roles/ceph-mon/tasks/main.yml",
> "Tuesday 23 June 2020  10:58:43 +0000 (0:00:00.072)       0:01:29.640 ********** ",
> "=============================================================================== ",

Seems that directory /tmp/ceph_ansible_tmp may have been created before as different user or with wrong permission.
As on controller-0 into which it tried to copy the task, there is warning about it visible.

controller-0/var/log/messages.txt.gz:
> Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: Invoked with path=/etc/tmpfiles.d/ceph-common.conf line=d /run/ceph 0770 root root - owner=root group=root mode=420 state=present create=True backrefs=False backup=False firstmatch=False follow=False regexp=None insertafter=None insertbefore=None validate=None seuser=None serole=None selevel=None setype=None attributes=None src=None force=None content=NOT_LOGGING_PARAMETER remote_src=None delimiter=None directory_mode=None unsafe_writes=None
> Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

Comment 2 Francesco Pantano 2020-06-23 15:09:29 UTC
Created attachment 1698475 [details]
ansible.log

Full ansible.log of the first failure

Comment 3 John Fulton 2020-06-23 19:58:09 UTC
(In reply to Pavel Sedlák from comment #0)
> Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

- This was the first build with ansible 2.9.10 instead of 2.9.9 (it was changed last night)
- 2.9.10 includes [1], i.e. this change https://github.com/ansible/ansible/issues/68218
- ceph-ansible and tripleo-ansible have not changed
- an env 2.9.9 doesn't reproduce this problem

[1] https://github.com/ansible/ansible/commit/60275fd9b4db4362f435a68590264963f5a494c8#diff-4b131dc1948ab542ad6aa59bc509cb52R88

Comment 4 John Fulton 2020-06-23 20:52:09 UTC
As per this error:

Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

Ansible is suggesting I "create the remote_tmp dir with the correct permissions".

This is because of a change introduced in ansible 2.9.10

 https://github.com/ansible/ansible/issues/68218

My plan is to patch tripleo-ansible to do what they recommend. In other words:

Have tripleo-ansible create the remote_tmp dir with the correct permissions before it runs ceph-ansible (which uses that remote_tmp dir).

Comment 9 Francesco Pantano 2020-06-24 15:46:01 UTC
Created attachment 1698633 [details]
ceph-ansible execution  with the upstream patch

Comment 17 Sam Doran 2020-06-25 19:00:41 UTC
Here is my assessment of what's happening:

In Ansible <= 2.9.9, there was a bug in `lineinfile` where it did not honor `ANSIBLE_REMOTE_TMP`, so the `lineinfile` task was _not_ creating `/tmp/ceph_ansible_tmp` and was most likely using `/tmp`. Since that bug was fixed, `/tmp/ceph_ansible_tmp` is being created by the `lineinfile` task and owned by `root` with `700` permissions.

Later in `ceph-ansible`, because `ANSIBLE_REMOTE_TEMP` is set to the same directory, it tries to use it but cannot because that task is not being run as `root`.

This can be solved by:
    - explicitly creating `/tmp/ceph_ansible_tmp` with appropriate permissions
    - using different remote tmp directories for whatever is running that `lineinfile` task and the `ceph-ansible` run

Comment 18 John Fulton 2020-06-26 11:40:41 UTC
(In reply to Sam Doran from comment #17)
> In Ansible <= 2.9.9, there was a bug in `lineinfile` where it did not honor `ANSIBLE_REMOTE_TMP`

Thanks for the explanation. 

> This can be solved by:
>     - explicitly creating `/tmp/ceph_ansible_tmp` with appropriate permissions

The fixing patch did exactly that.

Comment 24 errata-xmlrpc 2020-07-29 07:53:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.