Bug 1730661 - Splitstack deployment fails due to no ssh connection to overcloud hosts
Summary: Splitstack deployment fails due to no ssh connection to overcloud hosts
Keywords:
Status: CLOSED DUPLICATE of bug 1734172
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Cédric Jeanneret
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-17 09:52 UTC by Sasha Smolyak
Modified: 2019-08-01 06:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-01 06:22:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
sosreport (14.84 MB, application/x-xz)
2019-07-17 09:52 UTC, Sasha Smolyak
no flags Details
overcloud install log (12.88 MB, text/plain)
2019-07-17 09:53 UTC, Sasha Smolyak
no flags Details

Description Sasha Smolyak 2019-07-17 09:52:17 UTC
Created attachment 1591370 [details]
sosreport

Description of problem:
Overcloud deployment fails on splitstack: no ssh connection to overcloud. Sos report attached
[stack@undercloud-0 ~]$ cat overcloud_deployment_12.log
2019-07-17 08:04:29.269 105641 WARNING tripleoclient.plugin [  admin] Waiting for messages on queue 'tripleo' with no timeout.
2019-07-17 08:43:26.226 105641 ERROR openstack [  admin] Overcloud configuration failed.

"TASK [register machine id] *****************************************************", "fatal: [ceph-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.137\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.137 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [controller-2]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.139\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.139 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [compute-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.118\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.118 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [compute-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.144\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.144 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [ceph-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.148\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.148 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [controller-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.129\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.129 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [controller-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.108\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.108 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [ceph-2]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.149\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.149 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", ""


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-10.6.1-0.20190713150434.2871ce0.el8ost.noarch
puppet-tripleo-10.4.2-0.20190701160408.ecbec17.el8ost.noarch
RHOS_TRUNK-15.0-RHEL-8-20190714.n.0

How reproducible:
100%

Steps to Reproduce:
1. Try to deploy 3 controllers, 2 computes, 3 cephs with splitstack
2.
3.

Actual results:
Deployment fails

Expected results:
Deployment passes

Additional info:

Comment 1 Sasha Smolyak 2019-07-17 09:53:35 UTC
Created attachment 1591371 [details]
overcloud install log

Comment 2 Cédric Jeanneret 2019-07-17 10:15:30 UTC
Sooo. apparently, ssh access isn't opened as expected.

We can see a resource named "003 accept ssh from all ipv4" (same for ipv6), but that one actually *removes* the ssh access (ensure => absent).

What's missing in the overcloud_install_log is apparently resources named "003 accept ssh from ctlplane subnet ..." which allows ssh access only from the ctlplane.

So we have either a link to https://bugs.launchpad.net/tripleo/+bug/1836696 or it's another thing. I tend to think it's another one, but we would need to get a view on the iptables content on the unreachable hosts. This can be done as follow:

- deploy the undercloud
- introspect nodes
- run the deploy command adding the "--stack-only" parameter
- run tripleo-config-download --output-dir oc-config-download
- run tripleo-ansible-inventory --ansible_ssh_user heat-admin --static-yaml-inventory oc-inventory.yaml
- fetch servers IP with "openstack server list", and connect to one of them (compute, controller, whatever)
- run ansible-playbook \
  -i oc-inventory.yaml \
  --private-key /home/stack/.ssh/id_rsa \
  --become \
  oc-config-download/deploy_steps_playbook.yaml

That last step will run as plain ansible the deploy. Doing so will allow you to have time to connect to the node, and will also allow to get the generated templates/configurations for the overcloud nodes.

Once you have the failure, as you're connected to the overcloud node, you will be able to provide the iptables content as follow:
sudo iptables -vnL INPUT

You can enable SSH access with this simple command: sudo iptables -I INPUT -j ACCEPT (note: this will enable ALL connections, not only SSH).
That will allow to provide an env we can use for debugging and researches.

Care to provide that? Thanks!

Comment 7 Sasha Smolyak 2019-08-01 06:22:30 UTC

*** This bug has been marked as a duplicate of bug 1734172 ***


Note You need to log in before you can comment on or make changes to this bug.