In order to ensure that network-related problems can be caught quickly at overcloud deployment time, we need to include the network validation tests which have recently been posted upstream: https://review.openstack.org/#/c/204781/ https://review.openstack.org/#/c/204806/ These will verify that the interfaces which os-net-config has created can be used to ping the undercloud controller, before the configuration of openstack services begins. If that fails, the deployment will terminate with detailed error reporting.
FYI I tested the two upstream tripleo-heat-templates patches above. I cloned current downstream and cherry-picked those onto it My env was otherwise current poodle. I hit a problem with tuskar since the new 'validation-scripts' directory isn't included in the paths to grab role-extra data from. I have a review out for that @ https://review.openstack.org/#/c/204781/. With this applied and the roles recreated like [1] I got the expected output in compute/controller /var/log/messages:2118:Jul 28 07:27:35 localhost os-collect-config: [2015-07-28 07:27:35,828] (heat-config) [INFO] {"deploy_stdout": "Trying to ping 172.16.0.7 for local network 172.16.0.0/24...SUCCESS\nTrying to ping 172.16.1.9 for local network 172.16.1.0/24...SUCCESS\nTrying to ping 172.16.2.10 for local network 172.16.2.0/24...SUCCESS\nTrying to ping default gateway 192.0.2.1...SUCCESS\n", "deploy_stderr": "", "deploy_status_code": 0} [1] https://github.com/rdo-management/instack-undercloud/blob/c072ac1e16f3f75dc229c7cffae8acab0a52c1c9/doc/source/advanced_deployment/reload_roles_and_plan.rst
woops sorry the link above is wrong my review is at https://review.openstack.org/#/c/206502/ (I linked to dan's patch instead)
*** Bug 1255453 has been marked as a duplicate of this bug. ***
Currently on openstack-tripleo-heat-templates-0.8.6-58.el7ost.noarch this is failing due to: openstack overcloud deploy --templates --control-scale 3 --control-flavor vm --compute-scale 3 --compute-flavor baremetal --ntp-server 10.16.255.1 Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates ERROR: openstack Could not fetch contents for file:///usr/share/openstack-tripleo-heat-templates/validation-scripts/all-nodes.sh We're missing the all-nodes.sh file in the rpm: $ rpm -ql openstack-tripleo-heat-templates | grep validation /usr/share/openstack-tripleo-heat-templates/all-nodes-validation.yaml
Spec file issue fixed and rpm rebuilt
Thanks Dan, Marios, Should network-validation be turned off for virt deployments?
(In reply to wes hayutin from comment #17) > Thanks Dan, Marios, > Should network-validation be turned off for virt deployments? I think so, since it was designed to validate a production network, and we make compromises with the network topology in virt that confuses the validation script. This script is still very useful for bare metal deployments with network isolation.
To be clear the new validations were designed to work fine both with and without network isolation. Also, they should work in both virt and non-virt environments. This: https://review.openstack.org/#/c/204781/ ---- We spoke about this on IRC today and we think the issue with testing this downstream is actually caused by this missing revert (which already landed upstream): We need to backport this: https://review.openstack.org/#/c/205206/
Verified with : openstack-tripleo-heat-templates-0.8.6-62.el7ost.noarch I specified wrong VLAN on the network-environment.yaml (switch StorageMgmtNetworkVlanID: 203 --> StorageMgmtNetworkVlanID: 233 ) started deployment and got the following Error after ~25 minutes : ------------------------------------------------------------------- Stack failed with status: Resource CREATE failed: Error: resources.CephStorageAllNodesValidationDeployment.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 1 ERROR: openstack Heat Stack create failed [stack@undercloud ~]$ heat resource-list overcloud -n 5 | grep -v COMPLETE +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+ | CephStorageAllNodesValidationDeployment | 9e66cf81-c708-43ec-8bba-8ca70d03e859 | OS::Heat::StructuredDeployments | CREATE_FAILED | 2015-09-18T23:42:50Z | | | ComputeNodesPostDeployment | 611f66ad-7e57-47bf-9466-9479f1fd9524 | OS::TripleO::ComputePostDeployment | CREATE_FAILED | 2015-09-18T23:42:50Z | | | ControllerAllNodesValidationDeployment | 7a3f2705-2448-4226-8cd1-6c3ae8bf56f1 | OS::Heat::StructuredDeployments | CREATE_FAILED | 2015-09-18T23:42:50Z | | | ControllerNodesPostDeployment | ae3e5483-e395-4632-9bcd-92d7f291f016 | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2015-09-18T23:42:50Z | | | 0 | 8118e256-0913-4470-82f8-3d0eba7630d6 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2015-09-18T23:54:45Z | CephStorageAllNodesValidationDeployment | | 1 | e50c006c-5b2c-4385-a0ba-159e007c992c | OS::Heat::StructuredDeployment | CREATE_FAILED | 2015-09-18T23:55:07Z | ControllerAllNodesValidationDeployment | | 2 | 1db8dccb-7015-4529-bf96-100a6005700e | OS::Heat::StructuredDeployment | CREATE_FAILED | 2015-09-18T23:55:07Z | ControllerAllNodesValidationDeployment | | ControllerOvercloudServicesDeployment_Step6 | | OS::Heat::StructuredDeployments | CREATE_IN_PROGRESS | 2015-09-18T23:55:14Z | ControllerNodesPostDeployment | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+---------------------------------------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862