Created attachment 1479225 [details] overcloud_install.log Description of problem: Overcloud deployment fails with error message: STDOUT: Unexpected status FAILED for tripleo.deployment.v1.deploy_plan Creating Swift container to store the plan Creating plan from template files in: /tmp/tripleoclient-tsHcno/tripleo-heat-templates Plan created. Processing templates in the directory /tmp/tripleoclient-tsHcno/tripleo-heat-templates WARNING: Following parameter(s) are deprecated and still defined. Deprecated parameters will be removed soon! OvercloudControlFlavor WARNING: Following parameter(s) are defined but not used in plan. Could be possible that parameter is valid but currently not used. DockerNovaMetadataConfigImage DockerMysqlClientConfigImage RootStackName Deploying templates in the directory /tmp/tripleoclient-tsHcno/tripleo-heat-templates Initializing overcloud plan deployment Creating overcloud Heat stack The action raised an exception [action_ex_id=6b5c768d-6139-4661-88dc-67c81a1ca1e4, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 100}'] ERROR: Property error: : resources.Compute<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_b978eda285d44c5a9781d2cee57040ec/overcloud/puppet/compute-role.yaml>.resources.NetworkConfig.properties: : Unknown Property StorageMgmtInterfaceRoutes Version-Release number of selected component (if applicable): OSP14, containerized UC, puddle 2018-08-28.2 How reproducible: always Steps to Reproduce: 1. Deploy OSP14 using InfraRed, topology 1:1:1:1 2. Step ~/overcloud_deploy.sh &> overcloud_install.log fails Additional info: openstack-tripleo-heat-templates.noarch 9.0.0-0.20180818200902.cb08cb1.el7ost
This is due to https://review.openstack.org/#/c/580236/ which added <network>InterfaceRoutes for all networks. The nic config template files should be updated using process_templates.py to pick up any changes that have been made for OSP-14.
(In reply to Bob Fournier from comment #2) > This is due to https://review.openstack.org/#/c/580236/ which added > <network>InterfaceRoutes for all networks. > > The nic config template files should be updated using process_templates.py > to pick up any changes that have been made for OSP-14. Specifically, run the following commands: $ cd /usr/share/openstack-tripleo-heat-templates # or local copy $ ./tools/process_templates.py -n network_data.yaml -r roles_data.yaml \ -o /home/stack/templates_generated Then, look in /home/stack/templates_generated/network/config/single-nic-vlans/controller.yaml. You need to copy all of the parameters from the parameters: section, and replace the parameters in the existing NIC config templates. That will ensure that the new parameters exist in the custom NIC templates for the job. The version with the new parameters should work in both 13 and 14 if needed for upgrade testing.
There is an Infrared patch here that will update the nic config templates used for OSP-14 CI testing - https://review.gerrithub.io/c/redhat-openstack/infrared/+/423765. That should get us by the CI issue, however the larger issue is how to update existing nic config files when upgrading to OSP-14 using existing files that don't have <network>InterfaceRoutes. We can't expect users to run a script to update their nic config files to add these new parameters so it will have to be done as part of the upgrade process I believe.
Still have problems with that issue https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DF%20Current%20release/job/DFG-df-deployment-14-virthost-3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-no_OC_SSL-ceph-ipv4-vxlan-RHELOSP-31889/3/
(In reply to Artem Hrechanychenko from comment #5) > Still have problems with that issue > https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/ > DF%20Current%20release/job/DFG-df-deployment-14-virthost- > 3cont_1comp_3ceph_3db_2net_3msg-yes_UC_SSL-no_OC_SSL-ceph-ipv4-vxlan-RHELOSP- > 31889/3/ That's because the existing infrared patch only covered the nic templates for the monolithic deployments, the ones for composable roles deployments weren't adjusted yet.
>That's because the existing infrared patch only covered the nic templates for the >monolithic deployments, the ones for composable roles deployments weren't adjusted >yet. Exactly. The first set was to get past the phase 1 block. I've added a second patch that includes the composable_roles and virt_5nics here [1]. [1] https://review.gerrithub.io/c/redhat-openstack/infrared/+/424071
The fix in Infrared was to add the missing parameter to the parameters section of the nic config template files. As users may have OSP-13 nic config files when they upgrade to OSP-14 we think the best option would be to document this as part of the upgrade process. We can provide a list of new parameters that must be added.
(In reply to Bob Fournier from comment #8) > The fix in Infrared was to add the missing parameter to the parameters > section of the nic config template files. As users may have OSP-13 nic > config files when they upgrade to OSP-14 we think the best option would be > to document this as part of the upgrade process. We can provide a list of > new parameters that must be added. Thanks Bob, much appreciated. Indeed, adjusting nic templates should be a prerequisite for 13->14 upgrades.
Adding requires docs text to this one so it gets written up correctly.
I don't see the benefit of adding these to OSP13z. Most users keep their NIC templates outside of the THT tree. I.e the resource_registry overrides will look something like: resource_registry: OS::TripleO::BlockStorage::Net::SoftwareConfig: /home/stack/templates/nic-config/cinder-storage.yaml OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-config/compute.yaml OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-config/controller.yaml OS::TripleO::ObjectStorage::Net::SoftwareConfig: /home/stack/templates/nic-config/swift-storage.yaml OS::TripleO::CephStorage::Net::SoftwareConfig: /home/stack/templates/nic-config/ceph-storage.yaml So even if we where to add the parameters to the OSP-13 THT, without actually using them in the resource section, the end users will still have to update their NIC templates manually because we cannot modify the users custom NIC templates. Any users using the jinja rendered NIC templates in THT/network/config/*/*.yaml will get the new parameters included automatically. (But these are sample configurations that are only useful for a handful of configurations, so only a limited set of users will benefit from this.) IMO the way Heat validates[1] this could be relaxed. Currently any parameter passed to a template causes the validation to fail if that parameter is not defined in the template. In our case here it would be beneficial if a relaxed validation mode was available that would a) Check if the parameter passed is in the parameters section of the stack resource. b) Check if the parameter is used anywhere in the resources section of the stack template. if a and b: FAIL the validation. elif a and not b: WARNING on validation. By warning only when a parameter is missing, but not used in any resource the existing user would'nt have to update their NIC templates before upgrading. [1] https://github.com/openstack/heat/blob/master/heat/engine/properties.py#L408-L413
(In reply to Harald Jensås from comment #12) > I don't see the benefit of adding these to OSP13z. > > Most users keep their NIC templates outside of the THT tree. I.e the > resource_registry overrides will look something like: > > resource_registry: > OS::TripleO::BlockStorage::Net::SoftwareConfig: > /home/stack/templates/nic-config/cinder-storage.yaml > OS::TripleO::Compute::Net::SoftwareConfig: > /home/stack/templates/nic-config/compute.yaml > OS::TripleO::Controller::Net::SoftwareConfig: > /home/stack/templates/nic-config/controller.yaml > OS::TripleO::ObjectStorage::Net::SoftwareConfig: > /home/stack/templates/nic-config/swift-storage.yaml > OS::TripleO::CephStorage::Net::SoftwareConfig: > /home/stack/templates/nic-config/ceph-storage.yaml > > > So even if we where to add the parameters to the OSP-13 THT, without > actually using them in the resource section, the end users will still have > to update their NIC templates manually because we cannot modify the users > custom NIC templates. > > Any users using the jinja rendered NIC templates in > THT/network/config/*/*.yaml will get the new parameters included > automatically. (But these are sample configurations that are only useful for > a handful of configurations, so only a limited set of users will benefit > from this.) > > > > > > IMO the way Heat validates[1] this could be relaxed. Currently any parameter > passed to a template causes the validation to fail if that parameter is not > defined in the template. In our case here it would be beneficial if a > relaxed validation mode was available that would > > a) Check if the parameter passed is in the parameters section of the stack > resource. > b) Check if the parameter is used anywhere in the resources section of the > stack template. > > if a and b: > FAIL the validation. > elif a and not b: > WARNING on validation. > > By warning only when a parameter is missing, but not used in any resource > the existing user would'nt have to update their NIC templates before > upgrading. > +1 to turning the error into a warning if the parameters are not in use. If we cannot do that for OSP14 then I think at least a utility script to add these parameters to the existing nic templates is required(similar to yaml-nic-config-2-script.py that we used for ffu). We really don't want users to do this kind of changes manually. Any missed indentation space could potentially lead to broken networking later during the upgrade process.
(In reply to Marius Cornea from comment #15) > (In reply to Harald Jensås from comment #12) > > I don't see the benefit of adding these to OSP13z. > > > > Most users keep their NIC templates outside of the THT tree. I.e the > > resource_registry overrides will look something like: > > > > resource_registry: > > OS::TripleO::BlockStorage::Net::SoftwareConfig: > > /home/stack/templates/nic-config/cinder-storage.yaml > > OS::TripleO::Compute::Net::SoftwareConfig: > > /home/stack/templates/nic-config/compute.yaml > > OS::TripleO::Controller::Net::SoftwareConfig: > > /home/stack/templates/nic-config/controller.yaml > > OS::TripleO::ObjectStorage::Net::SoftwareConfig: > > /home/stack/templates/nic-config/swift-storage.yaml > > OS::TripleO::CephStorage::Net::SoftwareConfig: > > /home/stack/templates/nic-config/ceph-storage.yaml > > > > > > So even if we where to add the parameters to the OSP-13 THT, without > > actually using them in the resource section, the end users will still have > > to update their NIC templates manually because we cannot modify the users > > custom NIC templates. > > > > Any users using the jinja rendered NIC templates in > > THT/network/config/*/*.yaml will get the new parameters included > > automatically. (But these are sample configurations that are only useful for > > a handful of configurations, so only a limited set of users will benefit > > from this.) > > > > > > > > > > > > IMO the way Heat validates[1] this could be relaxed. Currently any parameter > > passed to a template causes the validation to fail if that parameter is not > > defined in the template. In our case here it would be beneficial if a > > relaxed validation mode was available that would > > > > a) Check if the parameter passed is in the parameters section of the stack > > resource. > > b) Check if the parameter is used anywhere in the resources section of the > > stack template. > > > > if a and b: > > FAIL the validation. > > elif a and not b: > > WARNING on validation. > > > > By warning only when a parameter is missing, but not used in any resource > > the existing user would'nt have to update their NIC templates before > > upgrading. > > > > +1 to turning the error into a warning if the parameters are not in use. > Relaxing the validations would come with a cost: We now fail on validation in case of any typo in a parameter name. If we relax the validations such errors would pass undetected. I am -1 to my initial proposal of relaxing the validations, the above scenario could very well break networking as well, and potentially render the node unreachable with the only recovery option being manual reboot to single-user mode/rescue mode. > If we cannot do that for OSP14 then I think at least a utility script to add > these parameters to the existing nic templates is required(similar to > yaml-nic-config-2-script.py that we used for ffu). We really don't want > users to do this kind of changes manually. Any missed indentation space > could potentially lead to broken networking later during the upgrade process. I am working on a utility script.
*** Bug 1643170 has been marked as a duplicate of this bug. ***
Verified, - conversion script is working as reported in : https://bugzilla.redhat.com/show_bug.cgi?id=1623007#c18 - the following describes the deployment validation: Tested by conducting 2 deployments with before [1] patch and after [2] patch https://review.gerrithub.io/c/redhat-openstack/infrared/+/424071 #All stack_env and sosreports files are at: http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1623007/ (undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version 2018-11-07.2 (undercloud) [stack@undercloud-0 ~]$ rhos-release -L Installed repositories (rhel-7.6): 14 ceph-3 ceph-osd-3 rhel-7.6 (undercloud) [stack@undercloud-0 ~]$ rpm -qa|grep openstack-tripleo-heat-templates openstack-tripleo-heat-templates-9.0.1-0.20181013060873.el7ost.noarch #[1] Before (undercloud) [stack@undercloud-0 ~]$ cat overcloud_install.log Unexpected status FAILED for tripleo.deployment.v1.deploy_plan Creating Swift container to store the plan Creating plan from template files in: /tmp/tripleoclient-Uotgk_/tripleo-heat-templates Plan created. Processing templates in the directory /tmp/tripleoclient-Uotgk_/tripleo-heat-templates WARNING: Following parameter(s) are deprecated and still defined. Deprecated parameters will be removed soon! OvercloudControlFlavor WARNING: Following parameter(s) are defined but not used in plan. Could be possible that parameter is valid but currently not used. CephStorageHostnameFormat CephAnsiblePlaybookVerbosity ComputeHostnameFormat ObjectStorageHostnameFormat RootStackName Deploying templates in the directory /tmp/tripleoclient-Uotgk_/tripleo-heat-templates Initializing overcloud plan deployment Creating overcloud Heat stack The action raised an exception [action_ex_id=3c628c17-3db4-46a8-882d-5d82ef842d39, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 100}'] ERROR: Property error: : resources.ComputeInstanceHA<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_3f486623e48840fea93bf625630caea6/overcloud/puppet/computeinstanceha-role.yaml>.resources.NetworkConfig.properties: : Unknown Property StorageMgmtInterfaceRoutes (undercloud) [stack@undercloud-0 ~]$ grep 'route\|InterfaceRoutes' virt/network/three-nics-vlans/legacy/controller.yaml description: default route for the external network routes: # Optionally have this interface as default route routes: (undercloud) [stack@undercloud-0 ~]$ grep 'route\|InterfaceRoutes' virt/network/three-nics-vlans/legacy/compute.yaml routes: # Optionally have this interface as default route #[2] After ##using post infrared patch templates: (undercloud) [stack@undercloud-0 ~]$ grep 'route\|InterfaceRoutes' virt/network/three-nics-vlans/controller.yaml description: default route for the external network StorageInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. StorageMgmtInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. InternalApiInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. TenantInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. ExternalInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. ManagementInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. routes: # Optionally have this interface as default route routes: (undercloud) [stack@undercloud-0 ~]$ grep 'route\|InterfaceRoutes' virt/network/three-nics-vlans/compute.yaml StorageInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. StorageMgmtInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. InternalApiInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. TenantInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. ExternalInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. ManagementInterfaceRoutes: JSON route e.g. [{'destination':'10.0.0.0/16', 'nexthop':'10.0.0.1'}] from the subnet host_routes attribute. routes: # Optionally have this interface as default route #during heat-stack config creation the routes are correctly picked up by heat : http://pastebin.test.redhat.com/667430 #overcloud deployment is succesfull : PLAY [External deployment Post Deploy tasks] *********************************** PLAY RECAP ********************************************************************* controller-0 : ok=218 changed=95 unreachable=0 failed=0 controller-1 : ok=212 changed=94 unreachable=0 failed=0 controller-2 : ok=212 changed=94 unreachable=0 failed=0 overcloud-novacomputeiha-0 : ok=172 changed=73 unreachable=0 failed=0 overcloud-novacomputeiha-1 : ok=171 changed=73 unreachable=0 failed=0 undercloud : ok=10 changed=7 unreachable=0 failed=0 Thursday 08 November 2018 07:38:40 -0500 (0:00:00.357) 1:05:02.621 ***** =============================================================================== Ansible passed. Overcloud configuration completed. Overcloud Endpoint: http://10.0.0.107:5000 Overcloud Horizon Dashboard URL: http://10.0.0.107:80/dashboard Overcloud rc file: /home/stack/overcloudrc Overcloud Deployed #testing pacemaker cluster and overcloud services are deployed and healthy: http://pastebin.test.redhat.com/667453
I would argue that this doctext is too narrow in scope. This wasn't just an infrared problem, it was a problem for anyone with custom nic-configs from a previous version of director.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045