Bug 1593345

Summary: Undercloud is not reachable by mistral: "Authentication or permission failure"
Product: Red Hat OpenStack Reporter: Filip Hubík <fhubik>
Component: openstack-tripleo-commonAssignee: Adriano Petrich <apetrich>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: apetrich, aschultz, dsavinea, jslagle, mburns, psedlak, sclewis, slinaber, tvignaud
Target Milestone: Upstream M3Keywords: Automation, AutomationBlocker, Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-9.1.1-0.20180623003933.5191b65.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1596260 (view as bug list) Environment:
Last Closed: 2019-01-11 11:50:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1596260    
Attachments:
Description Flags
ansible.log for mistral step failed on UC none

Description Filip Hubík 2018-06-20 15:40:28 UTC
Description of problem:

018-06-20 14:43:47Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE  state changed
2018-06-20 14:43:47Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  Stack CREATE completed successfully
2018-06-20 14:43:48Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE  state changed
2018-06-20 14:43:48Z [overcloud]: CREATE_COMPLETE  Stack CREATE completed successfully

 Stack overcloud/daa56963-8ba4-49b9-8242-d6b5c74f2dc4 CREATE_COMPLETE 

Deploying overcloud configuration
Enabling ssh admin (tripleo-admin) for hosts:
192.168.24.18 192.168.24.11 192.168.24.15
Using ssh user heat-admin for initial connection.
Using ssh key at /home/stack/.ssh/id_rsa for initial connection.
Inserting TripleO short term key for 192.168.24.18
Inserting TripleO short term key for 192.168.24.11
Inserting TripleO short term key for 192.168.24.15
Starting ssh admin enablement workflow
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - COMPLETE.
Removing TripleO short term key from 192.168.24.18
Removing TripleO short term key from 192.168.24.11
Removing TripleO short term key from 192.168.24.15
Removing short term keys locally
Enabling ssh admin - COMPLETE.
Config downloaded at /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49
Inventory generated at /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/tripleo-ansible-inventory.yaml
Running ansible playbook at /var/lib/misOvercloud configuration failed.
tral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/ansible.log for progress. ...

Using /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/ansible.cfg as config file

PLAY [Gather facts from undercloud] ********************************************

TASK [Gathering Facts] *********************************************************
fatal: [undercloud]: UNREACHABLE! => {"changed": false, "msg": "Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\". Failed command was: ( umask 77 && mkdir -p \"` echo /home/mistral/.ansible/tmp/ansible-tmp-1529505902.44-148383122247259 `\" && echo ansible-tmp-1529505902.44-148383122247259=\"` echo /home/mistral/.ansible/tmp/ansible-tmp-1529505902.44-148383122247259 `\" ), exited with result 1", "unreachable": true}

PLAY RECAP *********************************************************************
undercloud                 : ok=0    changed=0    unreachable=1    failed=0


Version-Release number of selected component (if applicable):
OSPd14

How reproducible:
always

Steps to Reproduce:
1. Deploy any OSPd14 topology using InfraRed and puddle 2018-06-19.4

Actual results:
Overcloud deploy stage fails with mentioned error

Additional info:
overcloud stack is created successfully, post-deployment mistral step fails

Comment 1 Filip Hubík 2018-06-20 15:42:52 UTC
Created attachment 1453236 [details]
ansible.log for mistral step failed on UC

/var/lib/mistral/XYZ/ansible.log

Comment 3 Alex Schultz 2018-06-20 20:34:35 UTC
I was able to recreate this, it seems to only happen when mistral runs the config download items. When I manually ran the ansible playbook script in /var/lib/mistral/<uuid>/ after the fact as root it ran fine.

James have you seen this one before?

Comment 4 Pavel Sedlák 2018-06-21 07:47:54 UTC
Mistral user, as which i believe var/lib/mistral/<uuid>/ansible-playbook-command.sh gets executed as, does not have home folder created.

in passwd there is mistral:x:988:985:Mistral Daemons:/home/mistral:/sbin/nologin
but /home/mistral does not exists

(that's why also running as root works, as root's home and so ansible tmp path exists/can be created)


mkdir /home/mistral; chown mistral:mistral /home/mistral
enables the playbook to pass the undercloud fact gathering point

Comment 5 Alex Schultz 2018-06-21 16:09:22 UTC
That would seem to be a packaging issue with mistral, though we haven't seen this issue upstream which makes me wonder why we only hit this downstream.

Comment 7 Alex Schultz 2018-06-21 16:42:01 UTC
It seems that we're using ansible-2.5.4 while upstream we use 2.5.2. The connection it's failing on is supposed to be localhost so it's not supposed to be using ssh. It's likely that there's an issue our ansible cfg around this.

Comment 8 Alex Schultz 2018-06-21 17:15:03 UTC
I think I've tracked this down to likely https://github.com/ansible/ansible/commit/864fd7c53e45703554bb6de608fe13a2200b6aa0

It appears that the local connection temp pathing has changed in ansible-2.5.4. Trying to figure out how we can work around this without setting remote_tmp because that would have other impacts

Comment 9 Alex Schultz 2018-06-21 18:27:36 UTC
Raised the issue with ansible. Current workaround is to downgrade to ansible 2.5.2. I've confirmed it is an issue with 2.5.4+ but should work in 2.4.2

Comment 10 Filip Hubík 2018-06-22 09:56:42 UTC
I can confirm that home for mistral user manually created before overcloud deployment workarounds this specific issue.

Comment 11 Alex Schultz 2018-06-22 21:00:58 UTC
*** Bug 1594385 has been marked as a duplicate of this bug. ***

Comment 16 Adriano Petrich 2018-11-12 10:14:49 UTC
Done

Comment 18 errata-xmlrpc 2019-01-11 11:50:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045