Created attachment 1591370 [details] sosreport Description of problem: Overcloud deployment fails on splitstack: no ssh connection to overcloud. Sos report attached [stack@undercloud-0 ~]$ cat overcloud_deployment_12.log 2019-07-17 08:04:29.269 105641 WARNING tripleoclient.plugin [ admin] Waiting for messages on queue 'tripleo' with no timeout. 2019-07-17 08:43:26.226 105641 ERROR openstack [ admin] Overcloud configuration failed. "TASK [register machine id] *****************************************************", "fatal: [ceph-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.137\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.137 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [controller-2]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.139\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.139 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [compute-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.118\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.118 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [compute-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.144\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.144 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [ceph-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.148\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.148 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [controller-1]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.129\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.129 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [controller-0]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.108\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.108 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "fatal: [ceph-2]: UNREACHABLE! => {\"changed\": false, \"msg\": \"Data could not be sent to remote host \\\"192.168.24.149\\\". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.149 port 22: Connection timed out\\r\\n\", \"unreachable\": true}", "" Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-10.6.1-0.20190713150434.2871ce0.el8ost.noarch puppet-tripleo-10.4.2-0.20190701160408.ecbec17.el8ost.noarch RHOS_TRUNK-15.0-RHEL-8-20190714.n.0 How reproducible: 100% Steps to Reproduce: 1. Try to deploy 3 controllers, 2 computes, 3 cephs with splitstack 2. 3. Actual results: Deployment fails Expected results: Deployment passes Additional info:
Created attachment 1591371 [details] overcloud install log
Sooo. apparently, ssh access isn't opened as expected. We can see a resource named "003 accept ssh from all ipv4" (same for ipv6), but that one actually *removes* the ssh access (ensure => absent). What's missing in the overcloud_install_log is apparently resources named "003 accept ssh from ctlplane subnet ..." which allows ssh access only from the ctlplane. So we have either a link to https://bugs.launchpad.net/tripleo/+bug/1836696 or it's another thing. I tend to think it's another one, but we would need to get a view on the iptables content on the unreachable hosts. This can be done as follow: - deploy the undercloud - introspect nodes - run the deploy command adding the "--stack-only" parameter - run tripleo-config-download --output-dir oc-config-download - run tripleo-ansible-inventory --ansible_ssh_user heat-admin --static-yaml-inventory oc-inventory.yaml - fetch servers IP with "openstack server list", and connect to one of them (compute, controller, whatever) - run ansible-playbook \ -i oc-inventory.yaml \ --private-key /home/stack/.ssh/id_rsa \ --become \ oc-config-download/deploy_steps_playbook.yaml That last step will run as plain ansible the deploy. Doing so will allow you to have time to connect to the node, and will also allow to get the generated templates/configurations for the overcloud nodes. Once you have the failure, as you're connected to the overcloud node, you will be able to provide the iptables content as follow: sudo iptables -vnL INPUT You can enable SSH access with this simple command: sudo iptables -I INPUT -j ACCEPT (note: this will enable ALL connections, not only SSH). That will allow to provide an env we can use for debugging and researches. Care to provide that? Thanks!
*** This bug has been marked as a duplicate of bug 1734172 ***