Description of problem: Deployment fails when Custom network names are used, for example: StorageMgmtNetName: customstgmgmtname InternalApiNetName: custominternalapiname Version-Release number of selected component (if applicable): 2018-05-04.1 How reproducible: 100% Steps to Reproduce: 1. create yaml with following lines: parameter_defaults: StorageMgmtNetName: customstgmgmtname InternalApiNetName: custominternalapiname 2. use this yaml with overcloud_deploy.sh script Actual results: Deployment fails with a very long list of errors overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 812f55a2-4e62-4330-b444-03f8babb83b1 status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | <...> "2018-05-07 19:03:53,339 ERROR: 22970 -- ERROR configuring haproxy" Expected results: CREATE_COMPLETE Additional info: Initial bug was opened for RHOS10
*** This bug has been marked as a duplicate of bug 1564654 ***
Sorry it's not a dupe. "Error: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content: change from {md5}1f337186b0e1ba5ee82760cb437fb810 to {md5}685eb1bcc74004d40c46ef85d025553e failed: Execution of '/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg20180507-12-r8yetb -c' returned 1: [ALERT] 126/190309 (349) : parsing [/etc/haproxy/haproxy.cfg20180507-12-r8yetb:37] : 'server 172.17.1.17:8042' : invalid address: 'check' in 'check'"
Alex and I have digged a bit: Before composable networks we could rename a network by changing these 2 parameters but in OSP13 it requires more steps that need to be documented, since we are now using jinja templates for network configuration: https://github.com/openstack/tripleo-heat-templates/blob/e64c10b9c13188f37e6f122475fe02280eaa6686/puppet/all-nodes-config.j2.yaml#L180 It can be considered as a backward incompatible change in the way networks names were managed, but I think as long as it's well documented & tested by QE, this is acceptable. Example of file that also need to be updated: https://github.com/openstack/tripleo-heat-templates/blob/e64c10b9c13188f37e6f122475fe02280eaa6686/network_data.yaml#L67 Anyway, we decided to assign this bug to DFG:HardProv and engage some documentation change on how do we expect our users to rename a network (list the files that need to be updated).
There may be a couple issues. If its the setting of hieradata in all_nodes_config.j2.yaml per Comment 6, Harald has a patch upstream for that - https://review.openstack.org/#/c/569544/. However those hardcoded hieradata values existed even in OSP-10 https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/all-nodes-config.yaml#L176, so I fail to see how this test would have worked then or since. Its better with the jinja changes as all the hieradata hardcodes can be removed except the InternalApi one needed for contrail. If its that the names in network_data.yaml need to be updated, that can be handled via docs. Note that the original fix was for https://bugs.launchpad.net/tripleo/+bug/1651541, this was prior to the jinja changes for composable networks. Attempting to duplicate this.
Alex - can you put the sosreports in a different place that is accessible? As before, I'm having issues pulling these down from google drive, I get the multiple redirect errors. Thanks.
Ok so I discussed this with Bob, and I think there are several problems: 1. Since implementing composable networks, there are two ways to rename a network, network_data.yaml and the *NetName parameters - this presents a problem because it's not always easy to know which one to use (I guess the *NetName parameters should always take precedence, but it's fairly confusing and IMHO we should consider deprecating/removing these parameters) 2. The ServiceNetMap and NetIpMap were updated in an attempt to provide backwards compatibility for the *NetName parameters ref https://review.openstack.org/#/q/topic:bug/1651541 but https://review.openstack.org/#/c/531036/ added the hieradata for the per-network VIPs only using the network_data names - I suspect this is why the haproxy config is failing? I think we have a workaround, which is to update the network_data.yaml instead of *NetName (one disadvantage is I don't think this will work via the UI). In terms of a fix, if we can prove it's the VIP hieradata which is the problem, we could update those key names using a similar approach to my previous patches for ServiceNetMap and NetIpMap, but I do think we should consider deprecating *NetName for rocky as the logic required to maintain them is pretty ugly and it's confusing having two interfaces which do the same thing?
Thanks Steven. I was able to verify the workaround, at least for the internal_api custom naming, using network_data.yaml as follows: 1. I copied network_data.yaml locally and updated name_lower for InternalApi - name_lower: custominternalapiname then used network_data.yaml in the deployment. 2. I made a local parameter for ServiceNetMap in network_environment.yaml and changed all use of "internal_api" to "custominternalapiname" Note - I believe ServiceNetMap had to be manually updated in this case because name_lower is the new custom name, and this replacement code no longer matches it: https://github.com/openstack/tripleo-heat-templates/blob/master/network/service_net_map.j2.yaml#L130 This workaround worked fine and the deployment completed with the custom InternalApi name. There seems to be an issue using a custom StorageMgmt name because the VipPort name is hardcoded for StorageMgmt and can't be substituted like other networks. https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.j2.yaml#L803. I will add patch for this. Currently I'm trying to get the deployment to work with the *NetName substitution by changing access to "name_lower" to instead access "{get_param: {{network.name}}NetName}]}". However, the jinja templating makes extensive use of "network_lower" so I'm not sure supporting both methods in OSP-13z will be possible. If not, I'd recommend we document the name_lower substitution in network_data.yaml, fix the StorageMgmt VipPort issue as above, and also fix the ServiceNetMap substitution to handle an updated name_lower so we don't need a local copy of ServiceNetMap.
Created attachment 1449234 [details] network_data.yaml file for workaround This must be included in deployment using "-n network_data.yaml"
Created attachment 1449235 [details] network-environment.yaml with changes to ServiceNetMap
I've verified the workaround using network_data.yaml and included the files that can be used. The network_data.yaml has the custom names in name_lower for the networks used in the bug description - StorageMgmt and InternalApi. This is the recommended way of implementing custom network names going forward. The network-environment.yaml sets up ServiceNetMap to use these custom names for the particular services. A follow-on fix in 13z release will change it so that modifications to ServiceNetMap are not needed.
Inadvertently closed, eaving open for additional fixes in OSP-13z.
Thanks, Bob, Adding doc-text / release note flags to address the following concern from Dan's side: "We should probably include a note in the upgrade instructions regardless, just in case anyone does have these parameters in place. I suspect that any attempt to upgrade without making changes to network_data.yaml would result in an error, rather than blowing up the whole stack, but it would still require manual effort to fix."
With this fix, in order to change the network name that is being used,network_data.yaml must be edited and included in the deployment. For example to change the name of the InternalApi network, make these changes in network_data.yaml: - name: InternalApi <- no change to this line name_lower: internal_custom <- new name of network service_net_map_replace: internal_api <- this is a new line that must match old name_lower After deployment, this would result in: (undercloud) [stack@host01 ~]$ openstack network list -c Name +------------------+ | Name | +------------------+ | storage | | management | | tenant | | internal_custom | <- new name | ctlplane | | storage_mgmt | | external | +------------------+ Note also, the workaround as described in comment 14 will have the same effect. It can be used prior to this fix being available (this fix will be in OSP-13z2).
Deploy osp13 puddle: 2018-08-03.3 This is verification of testing the ability to change to custom name of network prior to deploying overcloud. This is not a change on active deployment. Environment: openstack-tripleo-heat-templates-8.0.4-10.el7ost.noarch 1) Updated network_data.yaml (there is other data on all the networks, this show one change) Example: /usr/share/openstack-tripleo-heat-templates/network_data.yaml - name: StorageMgmt name_lower: custom_storage_mgmt service_net_map_replace: storage_mgmt 2) Add "-n /home/stack/network_data.yaml" line to your deployment configuration. Below is example (undercloud) [stack@undercloud-0 ~]$ cat overcloud_deploy.sh #!/bin/bash openstack overcloud deploy \ --timeout 100 \ --templates /usr/share/openstack-tripleo-heat-templates \ --stack overcloud \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -n /home/stack/network_data.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /home/stack/virt/extra_templates.yaml \ -e /home/stack/virt/docker-images.yaml \ --log-file overcloud_deployment_68.log 2018-08-09 21:46:27Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully Stack overcloud CREATE_COMPLETE Host 10.0.0.107 not found in /home/stack/.ssh/known_hosts Started Mistral Workflow tripleo.deployment.v1.get_horizon_url. Execution ID: 88add242-5bae-4210-834d-1dba3b52fe2c Overcloud Endpoint: http://10.0.0.107:5000/ Overcloud Horizon Dashboard URL: http://10.0.0.107:80/dashboard Overcloud rc file: /home/stack/overcloudrc Overcloud Deployed (undercloud) [stack@undercloud-0 ~]$ cat network_data.yaml (undercloud) [stack@undercloud-0 ~]$ openstack server list +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ | 0ade3f2c-e7ca-4f9d-ac3a-6c7c81c56804 | controller-0 | ACTIVE | ctlplane=192.168.24.9 | overcloud-full | controller | | 6fa7a118-136d-4c5d-99b6-01d9b9e2b1b9 | controller-2 | ACTIVE | ctlplane=192.168.24.14 | overcloud-full | controller | | c91bb0a9-06b7-4e40-9bfa-cf960d852e94 | ceph-0 | ACTIVE | ctlplane=192.168.24.10 | overcloud-full | ceph | | 23561ba4-7cb3-4479-bf95-3d888db05ad7 | controller-1 | ACTIVE | ctlplane=192.168.24.8 | overcloud-full | controller | | 9a4fa664-f1cf-4435-9df1-3a2ca2291502 | compute-2 | ACTIVE | ctlplane=192.168.24.17 | overcloud-full | compute | | c61e94b1-2a0a-4562-ac58-098d3d430db1 | ceph-2 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | ceph | | 403ade27-428e-4564-b07a-fdafe59a29e6 | ceph-1 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | ceph | | 9cdee3bd-2c62-485c-94b4-c06c8ee3a3c7 | compute-1 | ACTIVE | ctlplane=192.168.24.12 | overcloud-full | compute | | 8756510a-df95-47ac-a662-b63fed394e99 | compute-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | compute | +--------------------------------------+--------------+--------+------------------------+----------------+------------+ (undercloud) [stack@undercloud-0 ~]$ openstack network list -c Name +---------------------+ | Name | +---------------------+ | custom_storage_mgmt | <----- successfully changed | external | | management | | internal_api | | ctlplane | | storage | | tenant | +---------------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2574