Bug 1644702

Summary: [UPGRADES][14] Converge failed during ' ceph-config : generate ceph.conf configuration file
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: python-tripleoclientAssignee: Jiri Stransky <jstransk>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 14.0 (Rocky)CC: augol, ccamacho, hbrock, jslagle, jstransk, mbracho, mburns, sgolovat
Target Milestone: betaKeywords: Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-10.6.1-0.20181010222405.8c8f259.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-11 11:54:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2018-10-31 12:41:58 UTC
Description of problem:
-----------------------
Overcloud upgrade converge failed while running configuration for ceph:

openstack overcloud  upgrade converge \
    --templates /usr/share/openstack-tripleo-heat-templates \
    --stack qe-Cloud-0 \
    -e /home/stack/virt/internal.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
    -e /home/stack/virt/network/network-environment-v6.yaml \
    -e /home/stack/virt/enable-tls.yaml \
    -e /home/stack/virt/inject-trust-anchor.yaml \
    -e /home/stack/virt/public_vip.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
    -e /home/stack/virt/hostnames.yml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
    -e /home/stack/virt/debug.yaml \
    -e /home/stack/virt/nodes_data.yaml \
    -e /home/stack/cli_opts_params.yaml \
    -e /home/stack/containers-prepare-parameter.yaml \
    --roles-file /usr/share/openstack-tripleo-heat-templates/roles_data.yaml 2>&1
...
        "TASK [ceph-config : generate ceph.conf configuration file] *********************", 
        "task path: /usr/share/ceph-ansible/roles/ceph-config/tasks/main.yml:84", 
        "Wednesday 31 October 2018  07:15:35 -0400 (0:00:00.306)       0:00:53.352 ***** ", 
        "fatal: [controller-2]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \\\"/tmp\\\". Failed command was: ( umask 77 && mkdir -p \\\"` echo /tmp/ceph_ansible_tmp/ansible-tmp-1540984535.5-252804450955403 `\\\" && echo ansible-tmp-1540984535.5-252804450955403=\\\"` echo /tmp/ceph_ansible_tmp/ansible-tmp-1540984535.5-252804450955403 `\\\" ), exited with result 1\", \"unreachable\": true}", 
        "PLAY RECAP *********************************************************************", 
        "ceph-0                     : ok=2    changed=0    unreachable=0    failed=0   ", 
        "ceph-1                     : ok=2    changed=0    unreachable=0    failed=0   ", 
        "ceph-2                     : ok=2    changed=0    unreachable=0    failed=0   ", 
        "compute-0                  : ok=2    changed=0    unreachable=0    failed=0   ", 
        "compute-1                  : ok=2    changed=0    unreachable=0    failed=0   ", 
        "controller-0               : ok=2    changed=0    unreachable=0    failed=0   ", 
        "controller-1               : ok=2    changed=0    unreachable=0    failed=0   ", 
        "controller-2               : ok=44   changed=4    unreachable=1    failed=0   ", 


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-9.0.1-0.20181013060864.ffbe879.el7ost.noarch
ceph-ansible-3.1.5-1.el7cp.noarch

Steps to Reproduce:
-------------------
1. Upgrade undercloud to 14
2. Follow upgrade procedure for overcloud upgrade
3. Run `overcloud upgrade converge` command at the end of upgrade.

Actual results:
---------------
Overcloud upgrade converge failed

Expected results:
-----------------
Overcloud upgrade converge succeeds

Additional info:
----------------
Virtual env: 3controllers + 2computes + 3ceph

Comment 2 John Fulton 2018-10-31 15:03:23 UTC
The remote_user is set to tripleo-admin http://ix.io/1qxa 
The remote_temp_dir is owned by heat-admin http://ix.io/1qx6 
Thus, tripleo-admin cannot write to the remote_temp_dir

The bug is that ansible runs should be run not as heat-admin but as tripleo-admin.

If the upgrade run is happening as heat-admin, then hopefully that can be changed to tripleo-admin before the ceph-ansible step is run on converge.

Comment 9 errata-xmlrpc 2019-01-11 11:54:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045