Hide Forgot
Description of problem: ----------------------- Attempt to deploy split-stack env failed: openstack stack failures list --long overcloud overcloud.ComputeDeployedServerAllNodesValidationDeployment.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 9f005ba8-a2c8-4a49-ab7f-d6f086817fcc status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 deploy_stdout: | Trying to ping 172.17.1.21 for local network 172.17.0.0/16. Ping to 172.17.1.21 succeeded. SUCCESS Trying to ping 172.17.1.21 for local network 172.17.1.0/24. Ping to 172.17.1.21 succeeded. SUCCESS Trying to ping 172.17.2.20 for local network 172.17.0.0/16. Ping to 172.17.2.20 succeeded. SUCCESS Trying to ping 172.17.2.20 for local network 172.17.2.0/24. Ping to 172.17.2.20 succeeded. SUCCESS Trying to ping 172.17.3.19 for local network 172.17.0.0/16. Ping to 172.17.3.19 succeeded. SUCCESS Trying to ping 172.17.3.19 for local network 172.17.3.0/24. Ping to 172.17.3.19 succeeded. SUCCESS Trying to ping 172.17.4.19 for local network 172.17.0.0/16. Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... Ping to 172.17.4.19 failed. Retrying... FAILURE deploy_stderr: | 172.17.4.19 is not pingable. Local Network: 172.17.0.0/16 Deploy command: timeout 240m openstack overcloud deploy \ --disable-validations \ --templates /usr/share/openstack-tripleo-heat-templates \ -r /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml -e /home/stack/virt/enable-tls.yaml \ -e /home/stack/virt/inject-trust-anchor.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/docker-images.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-rhel.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-pacemaker-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /home/stack/SPLIT/ctlplane-net-ports.yaml \ -e /home/stack/SPLIT/deployed-server-env.yaml \ -e /home/stack/SPLIT/deployment-swift-data-map.yaml \ -e /home/stack/SPLIT/network-interface-mappings.yaml Problem seems to be with IP addresses in network-environment.yaml (using 172.17.*.0/24 ranges) and address on docker0 bridge: [root@compute-0 ~]# ip a s docker0 5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:7c:47:ea:cc brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever [root@compute-0 ~]# ip r default via 192.168.24.1 dev eth0 169.254.169.254 via 192.168.24.1 dev eth0 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 172.17.1.0/24 dev vlan20 proto kernel scope link src 172.17.1.14 172.17.2.0/24 dev vlan50 proto kernel scope link src 172.17.2.10 172.17.3.0/24 dev vlan30 proto kernel scope link src 172.17.3.16 192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.40 Deleting ip from docker0 and re-running failed hook succeeds: ------------------------------------------------------------- [root@compute-0 ~]# ip a d 172.17.0.1/16 dev docker0 [root@compute-0 ~]# ip r default via 192.168.24.1 dev eth0 169.254.169.254 via 192.168.24.1 dev eth0 172.17.1.0/24 dev vlan20 proto kernel scope link src 172.17.1.14 172.17.2.0/24 dev vlan50 proto kernel scope link src 172.17.2.10 172.17.3.0/24 dev vlan30 proto kernel scope link src 172.17.3.16 192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.40 [root@compute-0 ~]# /usr/libexec/heat-config/hooks/script < /var/lib/heat-config/deployed/f7775112-3723-4e58-99b5-a8c016de54e2.json [2017-10-17 07:59:01,466] (heat-config) [INFO] ping_test_ips=10.0.0.107 172.17.1.21 172.17.3.19 172.17.4.19 172.17.2.20 192.168.24.50 [2017-10-17 07:59:01,466] (heat-config) [INFO] validate_fqdn=False [2017-10-17 07:59:01,466] (heat-config) [INFO] validate_ntp=True [2017-10-17 07:59:01,466] (heat-config) [INFO] deploy_server_id=03256fa6-b548-4cb2-a285-d30c9e42baf1 [2017-10-17 07:59:01,466] (heat-config) [INFO] deploy_action=CREATE [2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_stack_id=overcloud-ComputeDeployedServerAllNodesValidationDeployment-ppvj5sq4kgbr/87a840e0-e4ce-4f9b-b2e8-1a9c848b2ca7 [2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_resource_name=0 [2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_signal_transport=TEMP_URL_SIGNAL [2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_signal_id=http://192.168.24.1:8080/v1/AUTH_b62fe87241954420b40d5874782b7154/87a840e0-e4ce-4f9b-b2e8-1a9c848b2ca7/overcloud-ComputeDeployedServerAllNodesValidationDeployment-ppvj5sq4kgbr-0-4uc3bvlwi5jw?temp_url_sig=ee7f24d49be9bd6f1cd8abea3f6b25d8dc44b193&temp_url_expires=2147483586 [2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_signal_verb=PUT [2017-10-17 07:59:01,467] (heat-config) [DEBUG] Running /var/lib/heat-config/heat-config-script/f7775112-3723-4e58-99b5-a8c016de54e2 [2017-10-17 07:59:01,856] (heat-config) [INFO] Trying to ping 172.17.1.21 for local network 172.17.1.0/24. Ping to 172.17.1.21 succeeded. SUCCESS Trying to ping 172.17.2.20 for local network 172.17.2.0/24. Ping to 172.17.2.20 succeeded. SUCCESS Trying to ping 172.17.3.19 for local network 172.17.3.0/24. Ping to 172.17.3.19 succeeded. SUCCESS Trying to ping 192.168.24.50 for local network 192.168.24.0/24. Ping to 192.168.24.50 succeeded. SUCCESS Trying to ping default gateway 192.168.24.1...Ping to 192.168.24.1 succeeded. SUCCESS [2017-10-17 07:59:01,856] (heat-config) [DEBUG] [2017-10-17 07:59:01,856] (heat-config) [INFO] Completed /var/lib/heat-config/heat-config-script/f7775112-3723-4e58-99b5-a8c016de54e2 {"deploy_stdout": "Trying to ping 172.17.1.21 for local network 172.17.1.0/24.\nPing to 172.17.1.21 succeeded.\nSUCCESS\nTrying to ping 172.17.2.20 for local network 172.17.2.0/24.\nPing to 172.17.2.20 succeeded.\nSUCCESS\nTrying to ping 172.17.3.19 for local network 172.17.3.0/24.\nPing to 172.17.3.19 succeeded.\nSUCCESS\nTrying to ping 192.168.24.50 for local network 192.168.24.0/24.\nPing to 192.168.24.50 succeeded.\nSUCCESS\nTrying to ping default gateway 192.168.24.1...Ping to 192.168.24.1 succeeded.\nSUCCESS\n", "deploy_stderr": "", "deploy_status_code": 0} Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-7.0.1-0.20170927205937.el7ost.noarch Additional info: ---------------- Virtual setup: 3controllers + 2computes + 3ceph
Stopping docker and removing ip address form docker0 helps to eliminate the issue.
I'm sending this one over to DFG:Containers, issue seems to be around the default docker network on docker0 potentially overlapping with other subnets and failing the ping IP validations. Our bootstrap for split-stack is just "yum install python-heat-agent*". The issue seems to be that it leaves the docker service enabled. So if you happened to reboot before starting the actual overcloud deployment, you'd get the docker0 network started and if it overlaps with a subnet you use int he overcloud, the delpoyment will fail.
Adding test_blocker / automation-blocker : as it seems to block Split-stack deployment completely. Should be fixed by adding the ability -> https://bugzilla.redhat.com/show_bug.cgi?id=1430438
Stopping docker before deployment doesn't work all the times, since it's started sometime during deployment, hence is subject to race condition. Tot bypass this I've manually configured IP address on the docker0 in /etc/docker/daemon.json.
(In reply to Yurii Prokulevych from comment #5) > Stopping docker before deployment doesn't work all the times, since it's > started sometime during deployment, hence is subject to race condition. > Tot bypass this I've manually configured IP address on the docker0 in > /etc/docker/daemon.json. So have you stopped or disabled the docker? My understanding right now is that it needs to be disabled the docker service so it stays off until it's configured respective deployment stage.
(In reply to Gurenko Alex from comment #6) > (In reply to Yurii Prokulevych from comment #5) > > Stopping docker before deployment doesn't work all the times, since it's > > started sometime during deployment, hence is subject to race condition. > > Tot bypass this I've manually configured IP address on the docker0 in > > /etc/docker/daemon.json. > > So have you stopped or disabled the docker? My understanding right now is > that it needs to be disabled the docker service so it stays off until it's > configured respective deployment stage. I didn't disable it that's why it was started after reboot. Than I stopped it before the deployment and it got started some time in the middle of deployment.
The docker0 bridge can be customised with /etc/docker/daemon.json. Can we do this to avoid the clash? https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0/
I see the upstream bugs/changes now Does split-stack actually bootstrap docker via tripleo::docker puppet?
(In reply to Steve Baker from comment #9) > I see the upstream bugs/changes now > > Does split-stack actually bootstrap docker via tripleo::docker puppet? Yes. It uses the same heat templates for software configuration as other deployments. I think the issue here though is that don't the validations that ping the vip's run before any of the service template configurations get applied? So, even if you wanted to configure docker to use a different subnet for docker0 so that it does not clash with the default subnet for internal_api, that configuration would not get applied until after the ping validations had already failed.
https://review.openstack.org/518853 merged in stable/pike.
Verified on build 2017-11-20.1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462