Bug 1269285
| Summary: | [Docs][Director]Overcloud Updates failed with network isolation | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | mathieu bultel <mbultel> | ||||||||||
| Component: | documentation | Assignee: | RHOS Documentation Team <rhos-docs> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | RHOS Documentation Team <rhos-docs> | ||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||
| Priority: | urgent | ||||||||||||
| Version: | 7.0 (Kilo) | CC: | arkady_kanevsky, augol, gchenuet, gfidente, hbrock, jcoufal, jslagle, lbopf, mbultel, mburns, mcornea, ohochman, randy_perryman, rhel-osp-director-maint, rybrown, srevivo, wayne_allen, whayutin, zbitter | ||||||||||
| Target Milestone: | ga | Keywords: | Automation, Documentation, Triaged | ||||||||||
| Target Release: | 8.0 (Liberty) | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 1318397 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2016-11-18 22:21:46 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 1276204 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
mathieu bultel
2015-10-06 21:08:18 UTC
It sounds very much like we're modifying on of the Subnets in such a way as to cause it to be replaced, which is something we need to not do on upgrade. Can you please note in a comment what the pre-update subnet was and what you're changing it to? I was able to repro this bug, and I've attached the new & old revisions of the templates for the GA and the latest puddle. Stack failed with status: resources.Networks: resources.TenantNetwork: Conflict: resources.TenantSubnet: Unable to complete operation on subnet 50a7023c-43ce- 42dd-aefe-04612f281f58. One or more ports have an IP allocation from this subnet. The difference in the network templates is here http://pastebin.test.redhat.com/320385 because in the older version it wasn't necessary to include the provisioning nic in the configuration. Created attachment 1083398 [details]
OSP 7 GA templates
Created attachment 1083399 [details]
netconfigs for puddle deploy
Created attachment 1083401 [details]
netconfigs for GA
Created attachment 1083402 [details]
templates from latest puddle
So far I've been able to reproduce, but only when running a deploy like: openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml Which includes the overcloud-resource-registry.yaml *after* the network-isolation.yaml, this overrides the subnet information and networks as NOOPs, causing Heat to try to delete the network. I avoided this problem by changing my environment file order, as such: openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com Note that the network-iso and network-env are last. This may (partially) be a docs fix, but we may be able to do something in code instead. After further testing, I don't think we can make any changes to fix this. Perhaps the best solution would be some additional upgrade documentation about ensuring the *order* is identical for all environment files. Dan, we need to make sure we add docs that specify the order of the environment files must be identical to the previous deployment. I have tried to update my deployment with this command: openstack overcloud update stack -i overcloud --debug --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml -e /home/stack/upgrade.yml So the net-iso templates are at the end of the command line, but it seems that the templates are ignore by the updates. The update still tried to override the subnets, and failed because it can't be delete since there are ports associate to them. Still trying to figure out this issue... hi Mathieu, can you attach the contents of ~/network-environment.yaml and ~/upgrade.yaml? the network-isolation.yaml:
parameter_defaults:
InternalApiNetCidr: 172.16.20.0/24
StorageNetCidr: 172.16.21.0/24
TenantNetCidr: 172.16.22.0/24
ExternalNetCidr: 172.16.23.0/24
InternalApiAllocationPools: [{'start': '172.16.20.10', 'end': '172.16.20.100'}]
StorageAllocationPools: [{'start': '172.16.21.10', 'end': '172.16.21.100'}]
TenantAllocationPools: [{'start': '172.16.22.10', 'end': '172.16.22.100'}]
ExternalAllocationPools: [{'start': '172.16.23.10', 'end': '172.16.23.100'}]
ExternalInterfaceDefaultRoute: 172.16.23.251
NeutronExternalNetworkBridge: "''"
ControlPlaneSubnetCidr: "24"
ControlPlaneDefaultRoute: 192.0.2.1
EC2MetadataIp: 192.0.2.1
And the upgrade.yml:
parameters:
ServiceNetMap:
NeutronTenantNetwork: tenant
CeilometerApiNetwork: internal_api
MongoDbNetwork: internal_api
CinderApiNetwork: internal_api
CinderIscsiNetwork: storage
GlanceApiNetwork: storage
GlanceRegistryNetwork: internal_api
KeystoneAdminApiNetwork: internal_api
KeystonePublicApiNetwork: internal_api
NeutronApiNetwork: internal_api
HeatApiNetwork: internal_api
NovaApiNetwork: internal_api
NovaMetadataNetwork: internal_api
NovaVncProxyNetwork: internal_api
SwiftMgmtNetwork: storage_mgmt
SwiftProxyNetwork: storage
HorizonNetwork: internal_api
MemcachedNetwork: internal_api
RabbitMqNetwork: internal_api
RedisNetwork: internal_api
MysqlNetwork: internal_api
CephClusterNetwork: storage_mgmt
CephPublicNetwork: storage
ControllerHostnameResolveNetwork: internal_api
ComputeHostnameResolveNetwork: internal_api
BlockStorageHostnameResolveNetwork: internal_api
ObjectStorageHostnameResolveNetwork: internal_api
CephStorageHostnameResolveNetwork: storage
resource_registry:
OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml
OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml
parameter_defaults:
ControlPlaneIP: {{ control_virtual_ip.stdout }}
ExternalNetworkVip: {{ public_virtual_ip.stdout }}
InternalApiNetworkVip: {{ internal_api_virtual_ip.stdout }}
StorageNetworkVip: {{ storage_virtual_ip.stdout }}
StorageMgmtNetworkVip: {{ storage_management_virtual_ip.stdout }}
ServiceVips:
redis: {{ redis_virtual_ip.stdout }}
Note that {{ redis_virtual_ip.stdout }} value, correspond to the ip of the output of neutron port-list
I'm re deploying 2 news env (the previous was broken).
So with this network-isolation custom configuration, heat try to apply the default network configuration, so it tries to remove the subnets on the current deployment and failed because there is port attached to them. Mathieu, I think we found the culprit. To update the subnets allocation pool without deleting/recreating the subnet we need to land: https://bugzilla.redhat.com/show_bug.cgi?id=1276204 Mathieu, I just realized bz #1276204 is in MODIFIED state, can you check which version of openstack-heat is installed on the undercloud before updating? Also, is the ~/network-environment.yaml changed in between the initial deployment and the update attempt? No, the ~/network-environment.yaml that I've pasted you is the initial file and the one I used for the updates. Does I need to change something there ? From the error message you posted it seems that Heat is trying to delete/recreate the subnet, which is what it would do before the fix for bz #1276204 if you were to update the allocation pools; can you check if this reproduces with openstack-heat-2015.1.1-8.el7ost ? It seems that it reproduce with openstack-heat-2015.1.1-8.el7ost. What I have updated : sudo yum localinstall -y openstack-heat-api-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cfn-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cloudwatch-2015.1.1-8.el7ost.noarch.rpm openstack-heat-common-2015.1.1-8.el7ost.noarch.rpm openstack-heat-engine-2015.1.1-8.el7ost.noarch.rpm Still the same with the latest poodle: openstack-heat version 1-2.1 Can you please retest this? |