Bug 1850978 - OC deploy fails on ceph-ansible generate ceph.conf - Failed to create temporary directory.
Summary: OC deploy fails on ceph-ansible generate ceph.conf - Failed to create tempora...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-release
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 15.0 (Stein)
Assignee: Lon Hohberger
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 1851190 (view as bug list)
Depends On: 1850059
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-25 10:18 UTC by Pavel Sedlák
Modified: 2020-06-26 12:40 UTC (History)
8 users (show)

Fixed In Version: rhos-release-1.5.40-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1850059
Environment:
Last Closed: 2020-06-26 12:10:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 204410 0 None MERGED Updated from global requirements 2020-09-10 19:20:05 UTC

Description Pavel Sedlák 2020-06-25 10:18:21 UTC
+++ This bug was initially created as a clone of Bug #1850059 +++

In OSP16.1 compose RHOS-16.1-RHEL-8-20200623.n.0
with ceph-ansible-4.0.23-1.el8cp.noarch
Overcloud deployment failed on ceph-ansible step.

From undercloud-0/var/lib/mistral/overcloud/ansible.log.txt.gz:
> "TASK [ceph-config : generate ceph.conf configuration file] *********************",
> "task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:159",
> "Tuesday 23 June 2020  10:58:43 +0000 (0:00:00.261)       0:01:29.567 ********** ",
> "fatal: [controller-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \\\"/tmp\\\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \\\"` echo /tmp/ceph_ansible_tmp `\\\"&& mkdir /tmp/ceph_ansible_tmp/ansible-tmp-1592909923.3014195-164729-113154489071080 && echo ansible-tmp-1592909923.3014195-164729-113154489071080=\\\"` echo /tmp/ceph_ansible_tmp/ansible-tmp-1592909923.3014195-164729-113154489071080 `\\\" ), exited with result 1\", \"unreachable\": true}",
> "NO MORE HOSTS LEFT *************************************************************",
> "PLAY RECAP *********************************************************************",
> "ceph-0                     : ok=60   changed=3    unreachable=0    failed=0    skipped=143  rescued=0    ignored=0   ",
> "compute-0                  : ok=48   changed=3    unreachable=0    failed=0    skipped=143  rescued=0    ignored=0   ",
> "controller-0               : ok=93   changed=6    unreachable=1    failed=0    skipped=214  rescued=0    ignored=0   ",
> "INSTALLER STATUS ***************************************************************",
> "Install Ceph Monitor           : In Progress (0:00:12)",
> "\tThis phase can be restarted by running: roles/ceph-mon/tasks/main.yml",
> "Tuesday 23 June 2020  10:58:43 +0000 (0:00:00.072)       0:01:29.640 ********** ",
> "=============================================================================== ",

Seems that directory /tmp/ceph_ansible_tmp may have been created before as different user or with wrong permission.
As on controller-0 into which it tried to copy the task, there is warning about it visible.

controller-0/var/log/messages.txt.gz:
> Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: Invoked with path=/etc/tmpfiles.d/ceph-common.conf line=d /run/ceph 0770 root root - owner=root group=root mode=420 state=present create=True backrefs=False backup=False firstmatch=False follow=False regexp=None insertafter=None insertbefore=None validate=None seuser=None serole=None selevel=None setype=None attributes=None src=None force=None content=NOT_LOGGING_PARAMETER remote_src=None delimiter=None directory_mode=None unsafe_writes=None
> Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

--- Additional comment from fpantano on 2020-06-23 17:09:29 CEST ---

Full ansible.log of the first failure

--- Additional comment from John Fulton on 2020-06-23 21:58:09 CEST ---

(In reply to Pavel Sedlák from comment #0)
> Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

- This was the first build with ansible 2.9.10 instead of 2.9.9 (it was changed last night)
- 2.9.10 includes [1], i.e. this change https://github.com/ansible/ansible/issues/68218
- ceph-ansible and tripleo-ansible have not changed
- an env 2.9.9 doesn't reproduce this problem

[1] https://github.com/ansible/ansible/commit/60275fd9b4db4362f435a68590264963f5a494c8#diff-4b131dc1948ab542ad6aa59bc509cb52R88

--- Additional comment from John Fulton on 2020-06-23 22:52:09 CEST ---

As per this error:

Jun 23 10:57:56 controller-0 ansible-lineinfile[48655]: [WARNING] Module remote_tmp /tmp/ceph_ansible_tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually

Ansible is suggesting I "create the remote_tmp dir with the correct permissions".

This is because of a change introduced in ansible 2.9.10

 https://github.com/ansible/ansible/issues/68218

My plan is to patch tripleo-ansible to do what they recommend. In other words:

Have tripleo-ansible create the remote_tmp dir with the correct permissions before it runs ceph-ansible (which uses that remote_tmp dir).

Comment 3 John Fulton 2020-06-25 12:53:50 UTC
OSP15's calling of ceph-ansible is done from tripleo-heat templates and not openstack-ansible so we can't simply backport the patch from 1850059. We probably have to take code from the same patch and arrange it in the tht context and then create a new submission directly into stein only.

Comment 7 John Fulton 2020-06-25 19:27:08 UTC
As per the OSP15 docs, customers should be enabling ansible-2.8-for-rhel-8-x86_64-rpms 

 https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/html/release_notes/chap-introduction

Comment 9 John Fulton 2020-06-26 12:40:57 UTC
*** Bug 1851190 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.