Description of problem: when trying to deploy RHOS14 (3 controller + 2 compute + 3 ceph) overcloud_deploy.sh script stuck. Version-Release number of selected component (if applicable): 2018-08-23.3 How reproducible: 100% Steps to Reproduce: 1. Try to deploy with splitstack Actual results: overcloud_deploy.sh script stuck until timeout Expected results: overcloud deployed successfully Additional info: As I can see heat stack is created successfully: (undercloud) [stack@undercloud-0 ~]$ openstack stack list +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | ID | Stack Name | Project | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ | e63536e5-db9c-4d15-a9f0-08a75eae9006 | overcloud | 6e12a328faab4861a962b8c02a12cba2 | CREATE_COMPLETE | 2018-08-28T11:49:52Z | None | +--------------------------------------+------------+----------------------------------+-----------------+----------------------+--------------+ However tripleo.deployment.v1.config_download_deploy exists, but not executed. /var/lib/mistral folder is empty on the undercloud. (undercloud) [stack@undercloud-0 ~]$ openstack workflow list | grep tripleo.deployment.v1.config_download_deploy | 39a4e666-adac-4201-be35-2c4b992cadce | tripleo.deployment.v1.config_download_deploy | | 6e12a328faab4861a962b8c02a12cba2 | tripleo-common-managed | timeout=240, queue_name=t... | private | 2018-08-28 11:29:11 | None | (undercloud) [stack@undercloud-0 ~]$ openstack workflow execution list | grep "tripleo.deployment.v1.config_download_deploy" (undercloud) [stack@undercloud-0 ~]$ (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh #!/bin/bash . ~/stackrc openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --disable-validation \ -r /home/stack/composable_roles/roles/roles_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-rhel.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-pacemaker-environment.yaml \ -e /home/stack/composable_roles/network-config.yaml \ -e /home/stack/composable_roles/ctrlplane-template.yml \ -e /home/stack/composable_roles/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/composable_roles/roles-port-config.yml \ -e /home/stack/composable_roles/network/network-environment.yaml \ -e /home/stack/composable_roles/enable-tls.yaml \ -e /home/stack/composable_roles/inject-trust-anchor.yaml \ -e /home/stack/composable_roles/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/composable_roles/debug.yaml \ -e /home/stack/composable_roles/docker-images.yaml \ --log-file overcloud_deployment_30.log
i believe we debugged this one and found that the enable_ssh_admin workflow had not run or failed. Please verify --overcloud-ssh-user and --overcloud-ssh-key are set correctly. I'd also need to see the full deployment log to debug further.
(In reply to James Slagle from comment #1) > i believe we debugged this one and found that the enable_ssh_admin workflow > had not run or failed. Please verify --overcloud-ssh-user and > --overcloud-ssh-key are set correctly. > > I'd also need to see the full deployment log to debug further. Yes, we've found this issue and we're updating our jobs to pass --overcloud-ssh-user, but the jobs still fail, I'm investigating today the new failures. At this point the failure on the IR side, not the product
please update the BZ if you find something.
the fix for this one will be to disable PasswordAuthenication in the ssh client so that the deployment doesn't appear to hang.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days