Description of problem: Connectivity checks do not verify the management network when it's enabled. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-2.0.0-12.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud with management network enabled Actual results: [root@ctrl-r01-01 ~]# grep ctrl-r01-01 /etc/hosts 10.0.0.11 ctrl-r01-01.redhat.local ctrl-r01-01 172.16.18.31 ctrl-r01-01-external 10.0.0.11 ctrl-r01-01-internalapi 10.0.0.131 ctrl-r01-01-storage 10.0.1.11 ctrl-r01-01-storagemgmt 10.0.1.131 ctrl-r01-01-tenant 172.16.17.151 ctrl-r01-01-management [root@ctrl-r01-01 ~]# journalctl -l -u os-collect-config | grep ping_test Jun 30 09:53:28 ctrl-r01-01.redhat.local os-collect-config[3804]: [2016-06-30 05:53:28,702] (heat-config) [DEBUG] [2016-06-30 05:53:27,911] (heat-config) [INFO] ping_test_ips=172.16.18.30 10.0.0.10 10.0.0.130 10.0.1.10 10.0.1.130 Expected results: The ping_test is also run for the 172.16.17 subnet. Additional info:
Is this a validation issue? Can you be more specific about the problem here?
(In reply to Jaromir Coufal from comment #2) > Is this a validation issue? Can you be more specific about the problem here? Yes, this is a validation issue. During the overcloud deployment there is a validation step that check connectivity between all nodes and controllers. When network isolation is enabled connectivity checks are done for each of the isolated network. The management network is optional so it's activated by the request of the operator. When it's being activated though there is no connectivity validation ran for it so we won't catch any possible misconfigurations with this network.
"Validation" is a bit overloaded here. This is not something coming out of the Ansible-based validations that show up in the UI but it's a check within the Heat templates (I think). It's also not something that's broken (or halting the deployment) per se. The check verifies we can access the networks being deployed. However, if we enable the management network, it's not verified in the same way.
This is a a ping test that is done by heat at the end of deployment. It finds the active networks on the controller, from a list of networks, and pings each one. The management network ip is missing from this list in overcloud.yaml, which is why it is not getting pinged. AllNodesValidationConfig: type: OS::TripleO::AllNodes::Validation properties: PingTestIps: list_join: - ' ' - - {get_attr: [Controller, resource.0.external_ip_address]} - {get_attr: [Controller, resource.0.internal_api_ip_address]} - {get_attr: [Controller, resource.0.storage_ip_address]} - {get_attr: [Controller, resource.0.storage_mgmt_ip_address]} - {get_attr: [Controller, resource.0.tenant_ip_address]}
Created upstream patch to add management_ip_address to list IPs to ping. https://review.openstack.org/#/c/350787/ The upstream bug is here: https://bugs.launchpad.net/tripleo/+bug/1609554
Seems to be merged.
Looks good: [root@overcloud-compute-0 heat-admin]# journalctl -l -u os-collect-config | grep ping_test Oct 17 12:13:45 overcloud-compute-0.localdomain os-collect-config[4454]: [2016-10-17 12:13:45,039] (heat-config) [DEBUG] [2016-10-17 12:13:44,139] (heat-config) [INFO] ping_test_ips=172.16.18.26 10.0.0.21 10.0.0.146 10.0.1.21 10.0.1.146 172.16.17.184 [root@overcloud-compute-0 heat-admin]# grep management /etc/hosts 172.16.17.184 overcloud-controller-0.management.localdomain overcloud-controller-0.management
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html