Description of problem: 018-06-20 14:43:47Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed 2018-06-20 14:43:47Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully 2018-06-20 14:43:48Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed 2018-06-20 14:43:48Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud/daa56963-8ba4-49b9-8242-d6b5c74f2dc4 CREATE_COMPLETE Deploying overcloud configuration Enabling ssh admin (tripleo-admin) for hosts: 192.168.24.18 192.168.24.11 192.168.24.15 Using ssh user heat-admin for initial connection. Using ssh key at /home/stack/.ssh/id_rsa for initial connection. Inserting TripleO short term key for 192.168.24.18 Inserting TripleO short term key for 192.168.24.11 Inserting TripleO short term key for 192.168.24.15 Starting ssh admin enablement workflow ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - RUNNING. ssh admin enablement workflow - COMPLETE. Removing TripleO short term key from 192.168.24.18 Removing TripleO short term key from 192.168.24.11 Removing TripleO short term key from 192.168.24.15 Removing short term keys locally Enabling ssh admin - COMPLETE. Config downloaded at /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49 Inventory generated at /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/tripleo-ansible-inventory.yaml Running ansible playbook at /var/lib/misOvercloud configuration failed. tral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/ansible.log for progress. ... Using /var/lib/mistral/9c5ad74e-3c88-4367-8502-f9f22fb86a49/ansible.cfg as config file PLAY [Gather facts from undercloud] ******************************************** TASK [Gathering Facts] ********************************************************* fatal: [undercloud]: UNREACHABLE! => {"changed": false, "msg": "Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\". Failed command was: ( umask 77 && mkdir -p \"` echo /home/mistral/.ansible/tmp/ansible-tmp-1529505902.44-148383122247259 `\" && echo ansible-tmp-1529505902.44-148383122247259=\"` echo /home/mistral/.ansible/tmp/ansible-tmp-1529505902.44-148383122247259 `\" ), exited with result 1", "unreachable": true} PLAY RECAP ********************************************************************* undercloud : ok=0 changed=0 unreachable=1 failed=0 Version-Release number of selected component (if applicable): OSPd14 How reproducible: always Steps to Reproduce: 1. Deploy any OSPd14 topology using InfraRed and puddle 2018-06-19.4 Actual results: Overcloud deploy stage fails with mentioned error Additional info: overcloud stack is created successfully, post-deployment mistral step fails
Created attachment 1453236 [details] ansible.log for mistral step failed on UC /var/lib/mistral/XYZ/ansible.log
I was able to recreate this, it seems to only happen when mistral runs the config download items. When I manually ran the ansible playbook script in /var/lib/mistral/<uuid>/ after the fact as root it ran fine. James have you seen this one before?
Mistral user, as which i believe var/lib/mistral/<uuid>/ansible-playbook-command.sh gets executed as, does not have home folder created. in passwd there is mistral:x:988:985:Mistral Daemons:/home/mistral:/sbin/nologin but /home/mistral does not exists (that's why also running as root works, as root's home and so ansible tmp path exists/can be created) mkdir /home/mistral; chown mistral:mistral /home/mistral enables the playbook to pass the undercloud fact gathering point
That would seem to be a packaging issue with mistral, though we haven't seen this issue upstream which makes me wonder why we only hit this downstream.
It seems that we're using ansible-2.5.4 while upstream we use 2.5.2. The connection it's failing on is supposed to be localhost so it's not supposed to be using ssh. It's likely that there's an issue our ansible cfg around this.
I think I've tracked this down to likely https://github.com/ansible/ansible/commit/864fd7c53e45703554bb6de608fe13a2200b6aa0 It appears that the local connection temp pathing has changed in ansible-2.5.4. Trying to figure out how we can work around this without setting remote_tmp because that would have other impacts
Raised the issue with ansible. Current workaround is to downgrade to ansible 2.5.2. I've confirmed it is an issue with 2.5.4+ but should work in 2.4.2
I can confirm that home for mistral user manually created before overcloud deployment workarounds this specific issue.
*** Bug 1594385 has been marked as a duplicate of this bug. ***
Done
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045