Bug 1503021 - [SPLIT-STACK] Failed to deploy oc due to ip's conflict
Summary: [SPLIT-STACK] Failed to deploy oc due to ip's conflict
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 12.0 (Pike)
Assignee: Martin André
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-17 08:18 UTC by Yurii Prokulevych
Modified: 2018-02-05 19:15 UTC (History)
9 users (show)

Fixed In Version: python-tripleoclient-7.3.3-4.el7ost openstack-tripleo-heat-templates-7.0.3-7.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:15:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1726773 0 None None None 2017-10-24 09:16:10 UTC
OpenStack gerrit 514576 0 None MERGED Make docker network configurable 2020-02-01 00:08:30 UTC
OpenStack gerrit 514581 0 None MERGED Configure docker0 bridge address 2020-02-01 00:08:30 UTC
OpenStack gerrit 517661 0 None MERGED Make docker network configurable 2020-02-01 00:08:30 UTC
OpenStack gerrit 518853 0 None MERGED Configure docker0 bridge address 2020-02-01 00:08:30 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Yurii Prokulevych 2017-10-17 08:18:30 UTC
Description of problem:
-----------------------
Attempt to deploy split-stack env failed:
openstack stack failures list --long overcloud
overcloud.ComputeDeployedServerAllNodesValidationDeployment.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 9f005ba8-a2c8-4a49-ab7f-d6f086817fcc
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    Trying to ping 172.17.1.21 for local network 172.17.0.0/16.
    Ping to 172.17.1.21 succeeded.
    SUCCESS
    Trying to ping 172.17.1.21 for local network 172.17.1.0/24.
    Ping to 172.17.1.21 succeeded.
    SUCCESS
    Trying to ping 172.17.2.20 for local network 172.17.0.0/16.
    Ping to 172.17.2.20 succeeded.
    SUCCESS
    Trying to ping 172.17.2.20 for local network 172.17.2.0/24.
    Ping to 172.17.2.20 succeeded.
    SUCCESS
    Trying to ping 172.17.3.19 for local network 172.17.0.0/16.
    Ping to 172.17.3.19 succeeded.
    SUCCESS
    Trying to ping 172.17.3.19 for local network 172.17.3.0/24.
    Ping to 172.17.3.19 succeeded.
    SUCCESS
    Trying to ping 172.17.4.19 for local network 172.17.0.0/16.
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    Ping to 172.17.4.19 failed. Retrying...
    FAILURE
  deploy_stderr: |
    172.17.4.19 is not pingable. Local Network: 172.17.0.0/16


Deploy command:
timeout 240m openstack overcloud deploy \
--disable-validations \
--templates /usr/share/openstack-tripleo-heat-templates \
-r /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml -e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-rhel.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-pacemaker-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /home/stack/SPLIT/ctlplane-net-ports.yaml \
-e /home/stack/SPLIT/deployed-server-env.yaml \
-e /home/stack/SPLIT/deployment-swift-data-map.yaml \
-e /home/stack/SPLIT/network-interface-mappings.yaml

Problem seems to be with IP addresses in network-environment.yaml (using 172.17.*.0/24 ranges) and address on docker0 bridge:
[root@compute-0 ~]# ip a s docker0
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:7c:47:ea:cc brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever 

[root@compute-0 ~]# ip r
default via 192.168.24.1 dev eth0
169.254.169.254 via 192.168.24.1 dev eth0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.17.1.0/24 dev vlan20 proto kernel scope link src 172.17.1.14
172.17.2.0/24 dev vlan50 proto kernel scope link src 172.17.2.10
172.17.3.0/24 dev vlan30 proto kernel scope link src 172.17.3.16
192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.40

Deleting ip from docker0 and re-running failed hook succeeds:
-------------------------------------------------------------
[root@compute-0 ~]# ip a d 172.17.0.1/16 dev docker0
[root@compute-0 ~]# ip r
default via 192.168.24.1 dev eth0
169.254.169.254 via 192.168.24.1 dev eth0
172.17.1.0/24 dev vlan20 proto kernel scope link src 172.17.1.14
172.17.2.0/24 dev vlan50 proto kernel scope link src 172.17.2.10
172.17.3.0/24 dev vlan30 proto kernel scope link src 172.17.3.16
192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.40

[root@compute-0 ~]# /usr/libexec/heat-config/hooks/script < /var/lib/heat-config/deployed/f7775112-3723-4e58-99b5-a8c016de54e2.json
[2017-10-17 07:59:01,466] (heat-config) [INFO] ping_test_ips=10.0.0.107 172.17.1.21 172.17.3.19 172.17.4.19 172.17.2.20 192.168.24.50
[2017-10-17 07:59:01,466] (heat-config) [INFO] validate_fqdn=False
[2017-10-17 07:59:01,466] (heat-config) [INFO] validate_ntp=True
[2017-10-17 07:59:01,466] (heat-config) [INFO] deploy_server_id=03256fa6-b548-4cb2-a285-d30c9e42baf1
[2017-10-17 07:59:01,466] (heat-config) [INFO] deploy_action=CREATE
[2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_stack_id=overcloud-ComputeDeployedServerAllNodesValidationDeployment-ppvj5sq4kgbr/87a840e0-e4ce-4f9b-b2e8-1a9c848b2ca7
[2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_resource_name=0
[2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_signal_transport=TEMP_URL_SIGNAL
[2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_signal_id=http://192.168.24.1:8080/v1/AUTH_b62fe87241954420b40d5874782b7154/87a840e0-e4ce-4f9b-b2e8-1a9c848b2ca7/overcloud-ComputeDeployedServerAllNodesValidationDeployment-ppvj5sq4kgbr-0-4uc3bvlwi5jw?temp_url_sig=ee7f24d49be9bd6f1cd8abea3f6b25d8dc44b193&temp_url_expires=2147483586
[2017-10-17 07:59:01,467] (heat-config) [INFO] deploy_signal_verb=PUT
[2017-10-17 07:59:01,467] (heat-config) [DEBUG] Running /var/lib/heat-config/heat-config-script/f7775112-3723-4e58-99b5-a8c016de54e2
[2017-10-17 07:59:01,856] (heat-config) [INFO] Trying to ping 172.17.1.21 for local network 172.17.1.0/24.
Ping to 172.17.1.21 succeeded.
SUCCESS
Trying to ping 172.17.2.20 for local network 172.17.2.0/24.
Ping to 172.17.2.20 succeeded.
SUCCESS
Trying to ping 172.17.3.19 for local network 172.17.3.0/24.
Ping to 172.17.3.19 succeeded.
SUCCESS
Trying to ping 192.168.24.50 for local network 192.168.24.0/24.
Ping to 192.168.24.50 succeeded.
SUCCESS
Trying to ping default gateway 192.168.24.1...Ping to 192.168.24.1 succeeded.
SUCCESS

[2017-10-17 07:59:01,856] (heat-config) [DEBUG] 
[2017-10-17 07:59:01,856] (heat-config) [INFO] Completed /var/lib/heat-config/heat-config-script/f7775112-3723-4e58-99b5-a8c016de54e2
{"deploy_stdout": "Trying to ping 172.17.1.21 for local network 172.17.1.0/24.\nPing to 172.17.1.21 succeeded.\nSUCCESS\nTrying to ping 172.17.2.20 for local network 172.17.2.0/24.\nPing to 172.17.2.20 succeeded.\nSUCCESS\nTrying to ping 172.17.3.19 for local network 172.17.3.0/24.\nPing to 172.17.3.19 succeeded.\nSUCCESS\nTrying to ping 192.168.24.50 for local network 192.168.24.0/24.\nPing to 192.168.24.50 succeeded.\nSUCCESS\nTrying to ping default gateway 192.168.24.1...Ping to 192.168.24.1 succeeded.\nSUCCESS\n", "deploy_stderr": "", "deploy_status_code": 0}

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-7.0.1-0.20170927205937.el7ost.noarch


Additional info:
----------------
Virtual setup: 3controllers + 2computes + 3ceph

Comment 2 Yurii Prokulevych 2017-10-17 09:58:10 UTC
Stopping docker and removing ip address form docker0 helps to eliminate the issue.

Comment 3 James Slagle 2017-10-17 14:17:40 UTC
I'm sending this one over to DFG:Containers, issue seems to be around the default docker network on docker0 potentially overlapping with other subnets and failing the ping IP validations.

Our bootstrap for split-stack is just "yum install python-heat-agent*". The issue seems to be that it leaves the docker service enabled. So if you happened to reboot before starting the actual overcloud deployment, you'd get the docker0 network started and if it overlaps with a subnet you use int he overcloud, the delpoyment will fail.

Comment 4 Omri Hochman 2017-10-18 13:57:08 UTC
Adding test_blocker / automation-blocker : as it seems to block Split-stack deployment completely.  

Should be fixed by adding the ability -> https://bugzilla.redhat.com/show_bug.cgi?id=1430438

Comment 5 Yurii Prokulevych 2017-10-20 08:05:13 UTC
Stopping docker before deployment doesn't work all the times, since it's started sometime during deployment, hence is subject to race condition.
Tot bypass this I've manually configured IP address on the docker0 in /etc/docker/daemon.json.

Comment 6 Gurenko Alex 2017-10-23 14:07:54 UTC
(In reply to Yurii Prokulevych from comment #5)
> Stopping docker before deployment doesn't work all the times, since it's
> started sometime during deployment, hence is subject to race condition.
> Tot bypass this I've manually configured IP address on the docker0 in
> /etc/docker/daemon.json.

So have you stopped or disabled the docker? My understanding right now is that it needs to be disabled the docker service so it stays off until it's configured respective deployment stage.

Comment 7 Yurii Prokulevych 2017-10-25 07:47:25 UTC
(In reply to Gurenko Alex from comment #6)
> (In reply to Yurii Prokulevych from comment #5)
> > Stopping docker before deployment doesn't work all the times, since it's
> > started sometime during deployment, hence is subject to race condition.
> > Tot bypass this I've manually configured IP address on the docker0 in
> > /etc/docker/daemon.json.
> 
> So have you stopped or disabled the docker? My understanding right now is
> that it needs to be disabled the docker service so it stays off until it's
> configured respective deployment stage.

I didn't disable it that's why it was started after reboot.
Than I stopped it before the deployment and it got started some time in the middle of deployment.

Comment 8 Steve Baker 2017-10-25 21:00:00 UTC
The docker0 bridge can be customised with /etc/docker/daemon.json. Can we do this to avoid the clash?

https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0/

Comment 9 Steve Baker 2017-10-25 21:08:48 UTC
I see the upstream bugs/changes now

Does split-stack actually bootstrap docker via tripleo::docker puppet?

Comment 10 James Slagle 2017-10-26 20:40:46 UTC
(In reply to Steve Baker from comment #9)
> I see the upstream bugs/changes now
> 
> Does split-stack actually bootstrap docker via tripleo::docker puppet?

Yes. It uses the same heat templates for software configuration as other deployments. I think the issue here though is that don't the validations that ping the vip's run before any of the service template configurations get applied?

So, even if you wanted to configure docker to use a different subnet for docker0 so that it does not clash with the default subnet for internal_api, that configuration would not get applied until after the ping validations had already failed.

Comment 12 Martin André 2017-11-10 12:31:25 UTC
https://review.openstack.org/518853 merged in stable/pike.

Comment 14 Gurenko Alex 2017-11-22 15:16:58 UTC
Verified on build 2017-11-20.1

Comment 17 errata-xmlrpc 2017-12-13 22:15:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.