Bug 1644916

Summary: Trying to use config-download in OSP13 fails to run deployment
Product: Red Hat OpenStack Reporter: Lars Kellogg-Stedman <lars>
Component: python-tripleoclientAssignee: James Slagle <jslagle>
Status: CLOSED NOTABUG QA Contact: Gurenko Alex <agurenko>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: emacchi, hbrock, jslagle, lars, lshort, mburns, sbaker
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-05 20:43:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lars Kellogg-Stedman 2018-10-31 22:03:55 UTC
Running a deploy command of the form:

    openstack overcloud deploy ... \
    	-e $THT/environments/config-download-environment.yaml \
	--config-download

Succeeds in creating the stack, but then immediately fails with:

    ssh: Could not resolve hostname : Name or service not known
    Removing short term keys locally
    Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-i', '/home/stack/.ssh/id_rsa', '-l', 'heat-admin', u'', "echo -e '\nssh-rsa ... TripleO split stack short term key\n' >> $HOME/.ssh/authorized_keys"]' returned non-zero exit status 255

Comment 1 James Slagle 2018-10-31 23:47:48 UTC
Please provide all templates (if anything is modified), and all environments, and the full deployment command, and what package versions are in use. Undercloud logs and the full deployment log would also be helpful.

Comment 2 Lars Kellogg-Stedman 2018-11-01 13:16:40 UTC
Templates and environment files are all available at:

  https://github.com/CCI-MOC/rhosp-director-config/tree/new-kaizen

The initial deploy command is:

  https://github.com/CCI-MOC/rhosp-director-config/blob/new-kaizen/overcloud-deploy.sh

Which package versions are of interest? Here are some:

openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
puppet-tripleo-8.3.4-5.el7ost.noarch
openstack-tripleo-common-containers-8.6.3-13.el7ost.noarch
openstack-tripleo-ui-8.3.2-1.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.1-1.el7ost.noarch
openstack-tripleo-common-8.6.3-13.el7ost.noarch
openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch
ansible-tripleo-ipsec-8.1.1-0.20180308133440.8f5369a.el7ost.noarch
openstack-tripleo-validations-8.4.2-1.el7ost.noarch
python-tripleoclient-9.2.3-4.el7ost.noarch

Comment 3 James Slagle 2018-11-01 13:39:30 UTC
https://github.com/CCI-MOC/rhosp-director-config/blob/new-kaizen/templates/deployedserverportmap.yaml doesn't look right, the format is:

<node_hostname>-<network>

<node_hostname> should be equivalent to what you're also setting in HostnameMap.

When not using config-download, it would be discovered as the short hostname by running "hostname -s" on each host.

Comment 4 James Slagle 2018-11-01 13:41:07 UTC
please try the above suggestion and let us know

Comment 5 Lars Kellogg-Stedman 2018-11-01 15:59:29 UTC
Using an updated hostnamemap:

  https://github.com/CCI-MOC/rhosp-director-config/blob/017c69abb310ae1597d904daa0f96e185fce4eba/templates/hostnamemap.yaml

And an updated deployedserverportmap:

  https://github.com/CCI-MOC/rhosp-director-config/blob/017c69abb310ae1597d904daa0f96e185fce4eba/templates/deployedserverportmap.yaml

I'm still seeing the deploy fail as soon as the stack create completes:

  Deploying overcloud configuration
  Enabling ssh admin (tripleo-admin) for hosts:
  
  Using ssh user heat-admin for initial connection.
  Using ssh key at /home/stack/.ssh/id_rsa for initial connection.
  Generating public/private rsa key pair.
  Your identification has been saved in /tmp/tmpNJgqSh/id_rsa.
  Your public key has been saved in /tmp/tmpNJgqSh/id_rsa.pub.
  The key fingerprint is:
  SHA256:QeSLlbl0+GpklXFdVqrm4Kw2bnaBShEwXnZHlJuELG8 TripleO split stack short term key
  The key's randomart image is:
  [...]
  Inserting TripleO short term key for
  ssh: Could not resolve hostname : Name or service not known
  Removing short term keys locally
  Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'StrictHostKeyChecking=no', '-i', '/home/stack/.ssh/id_rsa', '-l', 'heat-admin', u'', "echo -e '\nssh-rsa ... TripleO split stack short term key\n' >> $HOME/.ssh/authorized_keys"]' returned non-zero exit status 255

Comment 6 James Slagle 2018-11-01 16:38:02 UTC
Check https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/#sect-Configuring_Network_Interfaces_for_the_Control_Plane again, looks like you're missing the ip_address key under fixed_ips for each of the values in DeployedServerPortMap.

If it fails again with that update, can you provide the output of these commands as well:

openstack stack output show overcloud RoleNetHostnameMap -f json -c output_value

openstack stack output show overcloud HostsEntry

openstack stack environment show overcloud

Comment 7 Lars Kellogg-Stedman 2018-11-01 18:31:02 UTC
It looks like it was the missing ip_address key. That looks like a good place for a validation check, since it wants a list of dictionaries but was getting a list of strings and still trying to continue.

The deploy is not yet complete.  It now gets as far as:

  Started Mistral Workflow tripleo.deployment.v1.config_download_deploy. Execution ID: ...

But it appears to be stuck.  This may be hitting #1644917 now.

Comment 8 Lars Kellogg-Stedman 2018-11-06 14:44:54 UTC
I'm declaring this one resolved.  I can't test a config download deployment immediately, since as I undestand it there may be issues around octavia + config download + deployed servers, but https://bugzilla.redhat.com/show_bug.cgi?id=1644920 is fixed so this would probably proceed as well.

Comment 9 Luke Short 2019-06-05 20:43:23 UTC
As previously stated, config-download/Ansible requires the short hostnames of the Overcloud nodes to be used in the Heat variable `HostNameMap`.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/configuring-the-overcloud-with-ansible#enabling-config-download-with-pre-provisioned-nodes