Description of problem: Overcloud deploy consistently fails on a timeout in a standard install with ironic enabled. timeout 180m openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml \ --environment-file /home/stack/oc_ironic.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/docker-images.yaml \ -e /home/stack/virt/docker-images-ceph.yaml \ --log-file overcloud_deployment_71.log (undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud --long overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 3ebe28bd-dd71-471f-9c41-61a612921483 status: CREATE_FAILED status_reason: | CREATE aborted deploy_stdout: | None deploy_stderr: | None reproducer setup available, this failure happened 4 times so far Version-Release number of selected component (if applicable): openstack-tripleo-validations-7.2.1-0.20170818153714.85b7569.el7ost.noarch openstack-tripleo-ui-7.2.1-0.20170819031831.070a522.el7ost.noarch python-tripleoclient-7.2.1-0.20170810194758.e6b77c8.el7ost.noarch openstack-tripleo-common-containers-7.4.1-0.20170818153039.7d74e83.el7ost.noarch openstack-tripleo-image-elements-7.0.0-0.20170819032246.a7d1e89.el7ost.noarch openstack-tripleo-common-7.4.1-0.20170818153039.7d74e83.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170821194254.el7ost.noarch puppet-tripleo-7.3.0-0.20170821114704.el7ost.noarch openstack-tripleo-puppet-elements-7.0.0-0.20170819032135.23884d3.el7ost.noarch openstack-keystone-12.0.0-0.20170821133709.e2d33c2.el7ost.noarch openstack-swift-object-2.15.2-0.20170821181730.c54c6b3.el7ost.noarch openstack-ironic-conductor-9.0.2-0.20170821162300.adff15e.el7ost.noarch openstack-tempest-16.1.1-0.20170808012534.0fc1454.el7ost.noarch openstack-zaqar-5.0.0-0.20170821130425.a5338d3.el7ost.noarch openstack-swift-proxy-2.15.2-0.20170821181730.c54c6b3.el7ost.noarch openstack-heat-api-cfn-9.0.0-0.20170821132121.22d7142.el7ost.noarch python-openstacksdk-0.9.17-0.20170821143340.7946243.el7ost.noarch openstack-ironic-inspector-6.0.1-0.20170821143441.0e72dcb.el7ost.noarch puppet-openstack_extras-11.3.0-0.20170805114245.dae9508.el7ost.noarch openstack-puppet-modules-10.0.0-0.20170712001959.0333c73.el7ost.noarch openstack-swift-account-2.15.2-0.20170821181730.c54c6b3.el7ost.noarch openstack-neutron-11.0.0-0.20170821141644.3441b3f.el7ost.noarch openstack-neutron-openvswitch-11.0.0-0.20170821141644.3441b3f.el7ost.noarch openstack-heat-engine-9.0.0-0.20170821132121.22d7142.el7ost.noarch openstack-ironic-common-9.0.2-0.20170821162300.adff15e.el7ost.noarch openstack-mistral-api-5.0.0-0.20170821131022.0dc2ebe.el7ost.noarch openstack-nova-api-16.0.0-0.20170818142923.716a4c8.el7ost.noarch openstack-nova-conductor-16.0.0-0.20170818142923.716a4c8.el7ost.noarch python-openstackclient-lang-3.12.0-0.20170821150739.f67ebce.el7ost.noarch puppet-openstacklib-11.3.0-0.20170818121725.8832df0.el7ost.noarch python-openstack-mistral-5.0.0-0.20170821131022.0dc2ebe.el7ost.noarch openstack-nova-compute-16.0.0-0.20170818142923.716a4c8.el7ost.noarch openstack-neutron-common-11.0.0-0.20170821141644.3441b3f.el7ost.noarch openstack-heat-common-9.0.0-0.20170821132121.22d7142.el7ost.noarch openstack-ironic-api-9.0.2-0.20170821162300.adff15e.el7ost.noarch openstack-mistral-engine-5.0.0-0.20170821131022.0dc2ebe.el7ost.noarch openstack-nova-scheduler-16.0.0-0.20170818142923.716a4c8.el7ost.noarch openstack-nova-common-16.0.0-0.20170818142923.716a4c8.el7ost.noarch openstack-heat-api-9.0.0-0.20170821132121.22d7142.el7ost.noarch python-openstackclient-3.12.0-0.20170821150739.f67ebce.el7ost.noarch openstack-mistral-executor-5.0.0-0.20170821131022.0dc2ebe.el7ost.noarch openstack-glance-15.0.0-0.20170821194716.1610cda.el7ost.noarch openstack-swift-container-2.15.2-0.20170821181730.c54c6b3.el7ost.noarch openstack-neutron-ml2-11.0.0-0.20170821141644.3441b3f.el7ost.noarch openstack-mistral-common-5.0.0-0.20170821131022.0dc2ebe.el7ost.noarch openstack-selinux-0.8.9-0.1.el7ost.noarch openstack-nova-placement-api-16.0.0-0.20170818142923.716a4c8.el7ost.noarch How reproducible: always Steps to Reproduce: 1. see deploy cmd above 2. 3. Actual results: fails Expected results: create_complete Additional info:
Please provide templates & sosreport for nodes. This bug report is not sufficient to determine what is happening.
(undercloud) [stack@undercloud-0 ~]$ cat oc_ironic.yaml parameter_defaults: NtpServer: ["clock.redhat.com","clock2.redhat.com"] IronicEnabledDrivers: - pxe_ipmitool - fake NovaSchedulerDefaultFilters: - RetryFilter - AggregateInstanceExtraSpecsFilter - AvailabilityZoneFilter - RamFilter - DiskFilter - ComputeFilter - ComputeCapabilitiesFilter - ImagePropertiesFilter IronicCleaningDiskErase: metadata IronicIPXEEnabled: true GlanceBackend: "file" IronicCleaningNetwork: baremetal The rest are standard THT and IR2 default templates. sosreports at http://rhos-release.virt.bos.redhat.com/log/bz1488601 //if you ping me during EST hours I'll simply give you access to the setup
I had a quick look at the sosreports, no conclusion but could be a network issue or a problem bringing up the VIP for the Internal network: I can see lots of these in controller-0: var/log/messages:Sep 5 21:21:15 localhost journal: 2017-09-06 01:21:15.726 11 WARNING oslo_db.sqlalchemy.engines [-] SQL connection failed. -2924 attempts left.: DBConnectionError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '172.17.1.12' ([Errno 113] No route to host)”) That’s VLAN 20 for the Internal network. Then I can see this: $ cat ip_neigh_show fe80::5054:ff:fe2a:4290 dev br-ex lladdr 52:54:00:2a:42:90 router STALE 192.168.24.1 dev eth0 lladdr 52:54:00:3a:a6:f9 REACHABLE 172.17.1.14 dev vlan20 lladdr 7a:ed:8b:f9:db:11 STALE 10.0.0.1 dev br-ex lladdr 52:54:00:2a:42:90 STALE 172.17.1.12 dev vlan20 FAILED 10.0.0.111 dev br-ex lladdr 52:54:00:05:fe:73 STALE 192.168.24.7 dev eth0 lladdr 52:54:00:7c:7e:57 STALE 172.17.2.16 dev vlan50 lladdr fe:04:a0:8f:e0:ef STALE 172.17.3.14 dev vlan30 lladdr b6:03:82:a2:97:76 STALE 172.17.4.10 dev vlan40 lladdr 2e:87:84:fa:33:94 STALE 172.17.1.21 dev vlan20 lladdr 22:67:ed:c8:7a:58 REACHABLE 172.17.1.15 dev vlan20 lladdr 22:67:ed:c8:7a:58 STALE 172.17.4.19 dev vlan40 lladdr 2e:87:84:fa:33:94 STALE 172.17.3.13 dev vlan30 lladdr e6:42:ad:03:14:23 STALE 172.17.4.13 dev vlan40 lladdr 02:35:56:0c:c8:aa STALE 172.17.3.15 dev vlan30 lladdr b6:03:82:a2:97:76 STALE 10.0.0.101 dev br-ex lladdr 52:54:00:05:fe:73 STALE 172.17.2.15 dev vlan50 lladdr 46:fe:c1:6f:9c:b6 STALE 172.17.3.11 dev vlan30 lladdr ba:04:8e:35:ad:c8 STALE 172.17.1.20 dev vlan20 lladdr fa:85:17:3f:c6:32 REACHABLE 172.17.2.13 dev vlan50 lladdr c6:8d:be:3a:6d:4c STALE 10.0.0.105 dev br-ex lladdr 52:54:00:40:c7:9b STALE It looks like a networking problem. In controller-1: cat ip_neigh_show fe80::5054:ff:fe2a:4290 dev br-ex lladdr 52:54:00:2a:42:90 router STALE 172.17.1.15 dev vlan20 lladdr 22:67:ed:c8:7a:58 STALE 10.0.0.1 dev br-ex lladdr 52:54:00:2a:42:90 REACHABLE 172.17.2.21 dev vlan50 lladdr c6:fd:c8:f4:ee:85 STALE 172.17.4.10 dev vlan40 lladdr 2e:87:84:fa:33:94 STALE 172.17.4.19 dev vlan40 lladdr 2e:87:84:fa:33:94 STALE 172.17.1.21 dev vlan20 lladdr 22:67:ed:c8:7a:58 REACHABLE 172.17.1.12 dev vlan20 INCOMPLETE 172.17.4.11 dev vlan40 lladdr 06:e1:0e:5f:fb:15 STALE 192.168.24.7 dev eth0 lladdr 52:54:00:87:d8:55 STALE 192.168.24.1 dev eth0 lladdr 52:54:00:3a:a6:f9 REACHABLE 172.17.3.22 dev vlan30 lladdr 7a:fd:3c:24:93:06 STALE 10.0.0.104 dev br-ex lladdr 52:54:00:cc:dc:ce STALE 172.17.1.16 dev vlan20 lladdr e6:be:7b:0c:1d:ad REACHABLE And the same in controler-2. Also, pcs_status says the VIP is stopped: $ grep 172.17.1.12 pcs_status ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Stopped Then I also see this: sos_commands/logs/journalctl_--no-pager_--boot:Sep 05 14:28:38 controller-2 puppet-user[34344]: [ALERT] 247/142838 (339) : parsing [/etc/haproxy/haproxy.cfg20170905-12-11vkb1s:132] : 'bind 172.17.1.12:4 43' : unable to load SSL private key from PEM file '/etc/pki/tls/private/overcloud_endpoint.pem’. And this: sos_commands/logs/journalctl_--no-pager_--boot:Sep 05 14:28:38 controller-2 puppet-user[34344]: (/Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content) [ALERT] 247/142838 (339) : Proxy 'horizon': no SSL certificate specified for bind '172.17.1.12:443' at [/etc/haproxy/haproxy.cfg20170905-12-11vkb1s:132] (use 'crt’). Not sure if that’s a problem though, could it be worth trying with SSL and compare to try isolating the issue? I can’t find anything in corosync saying why that IP can’t be brought up though.
If this is being deployed with SSL we might be hitting https://bugs.launchpad.net/tripleo/+bug/1715132
(In reply to Alex Schultz from comment #4) > If this is being deployed with SSL we might be hitting > https://bugs.launchpad.net/tripleo/+bug/1715132 Yes, I am about to start a non-ssl reproducer, will confirm once it's done
Looking in the logs, it does appear to be the upstream bug. ./messages:Sep 5 10:28:54 localhost journal: [ALERT] 247/142854 (339) : Proxy 'panko': no SSL certificate specified for bind '10.0.0.101:13779' at [/etc/haproxy/haproxy.cfg20170905-12-1ksguai:250] (use 'crt'). ./messages:Sep 5 10:28:54 localhost journal: [ALERT] 247/142854 (339) : Proxy 'swift_proxy_server': no SSL certificate specified for bind '10.0.0.101:13808' at [/etc/haproxy/haproxy.cfg20170905-12-1ksguai:275] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'aodh': no SSL certificate specified for bind '10.0.0.101:13042' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:26] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'cinder': no SSL certificate specified for bind '10.0.0.101:13776' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:39] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'glance_api': no SSL certificate specified for bind '10.0.0.101:13292' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:52] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'gnocchi': no SSL certificate specified for bind '10.0.0.101:13041' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:65] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'heat_api': no SSL certificate specified for bind '10.0.0.101:13004' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:85] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'heat_cfn': no SSL certificate specified for bind '10.0.0.101:13005' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:100] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'heat_cloudwatch': no SSL certificate specified for bind '10.0.0.101:13003' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:115] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'horizon': no SSL certificate specified for bind '10.0.0.101:443' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:130] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'horizon': no SSL certificate specified for bind '172.17.1.12:443' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:132] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'ironic': no SSL certificate specified for bind '10.0.0.101:13385' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:147] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'keystone_public': no SSL certificate specified for bind '10.0.0.101:13000' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:167] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'neutron': no SSL certificate specified for bind '10.0.0.101:13696' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:192] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'nova_novncproxy': no SSL certificate specified for bind '10.0.0.101:13080' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:212] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'nova_osapi': no SSL certificate specified for bind '10.0.0.101:13774' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:224] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'nova_placement': no SSL certificate specified for bind '10.0.0.101:13778' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:237] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'panko': no SSL certificate specified for bind '10.0.0.101:13779' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:250] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'swift_proxy_server': no SSL certificate specified for bind '10.0.0.101:13808' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:275] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'aodh': no SSL certificate specified for bind '10.0.0.101:13042' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:26] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'cinder': no SSL certificate specified for bind '10.0.0.101:13776' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:39] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'glance_api': no SSL certificate specified for bind '10.0.0.101:13292' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:52] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'gnocchi': no SSL certificate specified for bind '10.0.0.101:13041' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:65] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'heat_api': no SSL certificate specified for bind '10.0.0.101:13004' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:85] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'heat_cfn': no SSL certificate specified for bind '10.0.0.101:13005' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:100] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'heat_cloudwatch': no SSL certificate specified for bind '10.0.0.101:13003' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:115] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'horizon': no SSL certificate specified for bind '10.0.0.101:443' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:130] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'horizon': no SSL certificate specified for bind '172.17.1.12:443' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:132] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'ironic': no SSL certificate specified for bind '10.0.0.101:13385' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:147] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'keystone_public': no SSL certificate specified for bind '10.0.0.101:13000' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:167] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'neutron': no SSL certificate specified for bind '10.0.0.101:13696' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:192] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'nova_novncproxy': no SSL certificate specified for bind '10.0.0.101:13080' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:212] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'nova_osapi': no SSL certificate specified for bind '10.0.0.101:13774' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:224] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'nova_placement': no SSL certificate specified for bind '10.0.0.101:13778' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:237] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'panko': no SSL certificate specified for bind '10.0.0.101:13779' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:250] (use 'crt'). ./messages:Sep 5 10:34:45 localhost journal: [ALERT] 247/143445 (3273) : Proxy 'swift_proxy_server': no SSL certificate specified for bind '10.0.0.101:13808' at [/etc/haproxy/haproxy.cfg20170905-9-e6vwjg:275] (use 'crt').
This has merged upstream in master, but it is still pending review for stable/pike: https://review.openstack.org/#/c/501127
Confirmed, with SSL disabled, I am able to deploy an overcloud, with the ironic bits enabled.
It merged in stable/pike. Please make sure that you're also not overwritting the value of DeployedSSLCertificatePath; just let it use the default value.
This is really a duplicate of bug#1486363, which is on track for inclusion in OSP12. Closing this as a duplicate. *** This bug has been marked as a duplicate of bug 1486363 ***