Running a deploy command of the form: openstack overcloud deploy ... \ -e $THT/environments/config-download-environment.yaml \ --config-download Succeeds in creating the stack, but then immediately fails with: ssh: Could not resolve hostname : Name or service not known Removing short term keys locally Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-i', '/home/stack/.ssh/id_rsa', '-l', 'heat-admin', u'', "echo -e '\nssh-rsa ... TripleO split stack short term key\n' >> $HOME/.ssh/authorized_keys"]' returned non-zero exit status 255
Please provide all templates (if anything is modified), and all environments, and the full deployment command, and what package versions are in use. Undercloud logs and the full deployment log would also be helpful.
Templates and environment files are all available at: https://github.com/CCI-MOC/rhosp-director-config/tree/new-kaizen The initial deploy command is: https://github.com/CCI-MOC/rhosp-director-config/blob/new-kaizen/overcloud-deploy.sh Which package versions are of interest? Here are some: openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch puppet-tripleo-8.3.4-5.el7ost.noarch openstack-tripleo-common-containers-8.6.3-13.el7ost.noarch openstack-tripleo-ui-8.3.2-1.el7ost.noarch openstack-tripleo-puppet-elements-8.0.1-1.el7ost.noarch openstack-tripleo-common-8.6.3-13.el7ost.noarch openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch ansible-tripleo-ipsec-8.1.1-0.20180308133440.8f5369a.el7ost.noarch openstack-tripleo-validations-8.4.2-1.el7ost.noarch python-tripleoclient-9.2.3-4.el7ost.noarch
https://github.com/CCI-MOC/rhosp-director-config/blob/new-kaizen/templates/deployedserverportmap.yaml doesn't look right, the format is: <node_hostname>-<network> <node_hostname> should be equivalent to what you're also setting in HostnameMap. When not using config-download, it would be discovered as the short hostname by running "hostname -s" on each host.
please try the above suggestion and let us know
Using an updated hostnamemap: https://github.com/CCI-MOC/rhosp-director-config/blob/017c69abb310ae1597d904daa0f96e185fce4eba/templates/hostnamemap.yaml And an updated deployedserverportmap: https://github.com/CCI-MOC/rhosp-director-config/blob/017c69abb310ae1597d904daa0f96e185fce4eba/templates/deployedserverportmap.yaml I'm still seeing the deploy fail as soon as the stack create completes: Deploying overcloud configuration Enabling ssh admin (tripleo-admin) for hosts: Using ssh user heat-admin for initial connection. Using ssh key at /home/stack/.ssh/id_rsa for initial connection. Generating public/private rsa key pair. Your identification has been saved in /tmp/tmpNJgqSh/id_rsa. Your public key has been saved in /tmp/tmpNJgqSh/id_rsa.pub. The key fingerprint is: SHA256:QeSLlbl0+GpklXFdVqrm4Kw2bnaBShEwXnZHlJuELG8 TripleO split stack short term key The key's randomart image is: [...] Inserting TripleO short term key for ssh: Could not resolve hostname : Name or service not known Removing short term keys locally Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-i', '/home/stack/.ssh/id_rsa', '-l', 'heat-admin', u'', "echo -e '\nssh-rsa ... TripleO split stack short term key\n' >> $HOME/.ssh/authorized_keys"]' returned non-zero exit status 255
Check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/#sect-Configuring_Network_Interfaces_for_the_Control_Plane again, looks like you're missing the ip_address key under fixed_ips for each of the values in DeployedServerPortMap. If it fails again with that update, can you provide the output of these commands as well: openstack stack output show overcloud RoleNetHostnameMap -f json -c output_value openstack stack output show overcloud HostsEntry openstack stack environment show overcloud
It looks like it was the missing ip_address key. That looks like a good place for a validation check, since it wants a list of dictionaries but was getting a list of strings and still trying to continue. The deploy is not yet complete. It now gets as far as: Started Mistral Workflow tripleo.deployment.v1.config_download_deploy. Execution ID: ... But it appears to be stuck. This may be hitting #1644917 now.
I'm declaring this one resolved. I can't test a config download deployment immediately, since as I undestand it there may be issues around octavia + config download + deployed servers, but https://bugzilla.redhat.com/show_bug.cgi?id=1644920 is fixed so this would probably proceed as well.
As previously stated, config-download/Ansible requires the short hostnames of the Overcloud nodes to be used in the Heat variable `HostNameMap`. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/configuring-the-overcloud-with-ansible#enabling-config-download-with-pre-provisioned-nodes