Bug 1269285
Summary: | [Docs][Director]Overcloud Updates failed with network isolation | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | mathieu bultel <mbultel> | ||||||||||
Component: | documentation | Assignee: | RHOS Documentation Team <rhos-docs> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | RHOS Documentation Team <rhos-docs> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 7.0 (Kilo) | CC: | arkady_kanevsky, augol, gchenuet, gfidente, hbrock, jcoufal, jslagle, lbopf, mbultel, mburns, mcornea, ohochman, randy_perryman, rhel-osp-director-maint, rybrown, srevivo, wayne_allen, whayutin, zbitter | ||||||||||
Target Milestone: | ga | Keywords: | Automation, Documentation, Triaged | ||||||||||
Target Release: | 8.0 (Liberty) | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1318397 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2016-11-18 22:21:46 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 1276204 | ||||||||||||
Bug Blocks: | |||||||||||||
Attachments: |
|
Description
mathieu bultel
2015-10-06 21:08:18 UTC
It sounds very much like we're modifying on of the Subnets in such a way as to cause it to be replaced, which is something we need to not do on upgrade. Can you please note in a comment what the pre-update subnet was and what you're changing it to? I was able to repro this bug, and I've attached the new & old revisions of the templates for the GA and the latest puddle. Stack failed with status: resources.Networks: resources.TenantNetwork: Conflict: resources.TenantSubnet: Unable to complete operation on subnet 50a7023c-43ce- 42dd-aefe-04612f281f58. One or more ports have an IP allocation from this subnet. The difference in the network templates is here http://pastebin.test.redhat.com/320385 because in the older version it wasn't necessary to include the provisioning nic in the configuration. Created attachment 1083398 [details]
OSP 7 GA templates
Created attachment 1083399 [details]
netconfigs for puddle deploy
Created attachment 1083401 [details]
netconfigs for GA
Created attachment 1083402 [details]
templates from latest puddle
So far I've been able to reproduce, but only when running a deploy like: openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml Which includes the overcloud-resource-registry.yaml *after* the network-isolation.yaml, this overrides the subnet information and networks as NOOPs, causing Heat to try to delete the network. I avoided this problem by changing my environment file order, as such: openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com Note that the network-iso and network-env are last. This may (partially) be a docs fix, but we may be able to do something in code instead. After further testing, I don't think we can make any changes to fix this. Perhaps the best solution would be some additional upgrade documentation about ensuring the *order* is identical for all environment files. Dan, we need to make sure we add docs that specify the order of the environment files must be identical to the previous deployment. I have tried to update my deployment with this command: openstack overcloud update stack -i overcloud --debug --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml -e /home/stack/upgrade.yml So the net-iso templates are at the end of the command line, but it seems that the templates are ignore by the updates. The update still tried to override the subnets, and failed because it can't be delete since there are ports associate to them. Still trying to figure out this issue... hi Mathieu, can you attach the contents of ~/network-environment.yaml and ~/upgrade.yaml? the network-isolation.yaml: parameter_defaults: InternalApiNetCidr: 172.16.20.0/24 StorageNetCidr: 172.16.21.0/24 TenantNetCidr: 172.16.22.0/24 ExternalNetCidr: 172.16.23.0/24 InternalApiAllocationPools: [{'start': '172.16.20.10', 'end': '172.16.20.100'}] StorageAllocationPools: [{'start': '172.16.21.10', 'end': '172.16.21.100'}] TenantAllocationPools: [{'start': '172.16.22.10', 'end': '172.16.22.100'}] ExternalAllocationPools: [{'start': '172.16.23.10', 'end': '172.16.23.100'}] ExternalInterfaceDefaultRoute: 172.16.23.251 NeutronExternalNetworkBridge: "''" ControlPlaneSubnetCidr: "24" ControlPlaneDefaultRoute: 192.0.2.1 EC2MetadataIp: 192.0.2.1 And the upgrade.yml: parameters: ServiceNetMap: NeutronTenantNetwork: tenant CeilometerApiNetwork: internal_api MongoDbNetwork: internal_api CinderApiNetwork: internal_api CinderIscsiNetwork: storage GlanceApiNetwork: storage GlanceRegistryNetwork: internal_api KeystoneAdminApiNetwork: internal_api KeystonePublicApiNetwork: internal_api NeutronApiNetwork: internal_api HeatApiNetwork: internal_api NovaApiNetwork: internal_api NovaMetadataNetwork: internal_api NovaVncProxyNetwork: internal_api SwiftMgmtNetwork: storage_mgmt SwiftProxyNetwork: storage HorizonNetwork: internal_api MemcachedNetwork: internal_api RabbitMqNetwork: internal_api RedisNetwork: internal_api MysqlNetwork: internal_api CephClusterNetwork: storage_mgmt CephPublicNetwork: storage ControllerHostnameResolveNetwork: internal_api ComputeHostnameResolveNetwork: internal_api BlockStorageHostnameResolveNetwork: internal_api ObjectStorageHostnameResolveNetwork: internal_api CephStorageHostnameResolveNetwork: storage resource_registry: OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml parameter_defaults: ControlPlaneIP: {{ control_virtual_ip.stdout }} ExternalNetworkVip: {{ public_virtual_ip.stdout }} InternalApiNetworkVip: {{ internal_api_virtual_ip.stdout }} StorageNetworkVip: {{ storage_virtual_ip.stdout }} StorageMgmtNetworkVip: {{ storage_management_virtual_ip.stdout }} ServiceVips: redis: {{ redis_virtual_ip.stdout }} Note that {{ redis_virtual_ip.stdout }} value, correspond to the ip of the output of neutron port-list I'm re deploying 2 news env (the previous was broken). So with this network-isolation custom configuration, heat try to apply the default network configuration, so it tries to remove the subnets on the current deployment and failed because there is port attached to them. Mathieu, I think we found the culprit. To update the subnets allocation pool without deleting/recreating the subnet we need to land: https://bugzilla.redhat.com/show_bug.cgi?id=1276204 Mathieu, I just realized bz #1276204 is in MODIFIED state, can you check which version of openstack-heat is installed on the undercloud before updating? Also, is the ~/network-environment.yaml changed in between the initial deployment and the update attempt? No, the ~/network-environment.yaml that I've pasted you is the initial file and the one I used for the updates. Does I need to change something there ? From the error message you posted it seems that Heat is trying to delete/recreate the subnet, which is what it would do before the fix for bz #1276204 if you were to update the allocation pools; can you check if this reproduces with openstack-heat-2015.1.1-8.el7ost ? It seems that it reproduce with openstack-heat-2015.1.1-8.el7ost. What I have updated : sudo yum localinstall -y openstack-heat-api-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cfn-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cloudwatch-2015.1.1-8.el7ost.noarch.rpm openstack-heat-common-2015.1.1-8.el7ost.noarch.rpm openstack-heat-engine-2015.1.1-8.el7ost.noarch.rpm Still the same with the latest poodle: openstack-heat version 1-2.1 Can you please retest this? |