Description of problem: During an update stack from 7.0 to 7.1, VIP change Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch How reproducible: Install a overcloud 7.0 and update stack to 7.1 Steps to Reproduce: 1. Install undercloud / overcloud in 7.0 2. Update undercloud in 7.1 3. Update openstack-puppet-modules on all nodes cf https://bugzilla.redhat.com/show_bug.cgi?id=1267318 4. to update the stack, do a openstack overcloud deploy --templates /home/stack/templates-7.1/ [...] Actual results: At least, my internal vip change from 10.154.20.10 to 10.154.20.23 during the update. | VipMap | d0e33d89-5c11-422f-8ed4-69bcf0a514ca | OS::TripleO::Network::Ports::NetVipMap | CREATE_COMPLETE | 2015-10-15T16:50:25Z | | heat output-show d0e33d89-5c11-422f-8ed4-69bcf0a514ca --all [ { "output_value": { "storage": "10.154.22.20", "ctlplane": "10.153.20.85", "external": "198.154.188.59", "internal_api": "10.154.20.23", "storage_mgmt": "10.154.23.16", "tenant": "" }, "description": "A Hash containing a mapping of network names to assigned IPs for a specific machine.\n", "output_key": "net_ip_map" } Expected results: Is suppose to not change Additional info:
This is because the VIPs are managed by a dedicated resource type in 7.1, they were not in 7.0; when upgrading heat creates the new resource (previously didn't exist) and this results in a new IP. Also see https://bugzilla.redhat.com/show_bug.cgi?id=1272347#c2
Will try with douple mapping first, eg: OS::TripleO::Network::Ports::ExternalVipPort: OS::TripleO::Controller::Ports::ExternalPort
Double mapping did not work; we'll have to resort to providing the VIPs manually as input parameters. I will post an update with instructions as soon as I have it tested.
There is a workaround which worked with single network; we're testing this in network isolation too. Steps are: 1. collect the overcloud VIPs querying the neutron undercloud *before* the update: $ neutron port-list 2. edit some upgrade.yaml with the following contents: resource_registry: OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml parameter_defaults: ControlPlaneIP: 192.0.2.18 ExternalNetworkVip: 192.0.2.19 InternalApiNetworkVip: 192.0.2.18 StorageNetworkVip: 192.0.2.18 StorageMgmtNetworkVip: 192.0.2.18 ServiceVips: redis: 192.0.2.20 when deploying without some of the networks, the non-existent VIPs can be the same of the InternalApiNetworkVip; when deploying with single network, the non-existent VIPs can be the same of ControlPlaneIP 3. perform the upgrade passing as additional argument: -e /path/to/upgrade.yaml
The workaround in comment #5 works in scenarios using network isolation too.
I raised an upstream bug https://bugs.launchpad.net/heat/+bug/1508115 This describes some possible ways we could make heat less destructive on update, which I think would fix this problem. I don't currently have an ETA for implementing that though, so continuing to discuss workarounds is wise. Re the workaround in comment #5 - won't the neutron ports be deleted due to the switch to noop.yaml, e.g those statically assigned IP's could end up being re-assigned later via the neutron IPAM?
Steven hi, thanks, I will check about the neutron ports and if we can exclude those from the IP pool.
Steven, the neutron ports are indeed deleted as you suggested. We could potentially exclude the VIPs from the allocation pools with: ExternalAllocationPools: [{'start': '10.0.0.5', 'end': '10.0.0.250'}] StorageAllocationPools: [{'start': '172.16.1.5', 'end': '172.16.1.250'}] StorageMgmtAllocationPools: [{'start': '172.16.3.5', 'end': '172.16.3.250'}] InternalApiAllocationPools: [{'start': '172.16.2.6', 'end': '172.16.2.250'}] but that won't work, emitting: Conflict: resources.StorageSubnet: Unable to complete operation on subnet a4922be8-5358-4bca-b2f7-a6605839d00f. One or more ports have an IP allocation from this subnet. From heat logs it seems to be attempting a DELETE on the neutron network/subnet; I suppose that is caused by the allocation_pools parameter being updated?
potential upstream fix: https://review.openstack.org/#/c/238194/
We are not going to support 7.0 to 7.1 and both 7.1 and 7.0 were upgraded to 7.2 in CI and in the lab. is it enough to close this ?
pre update: stack@instack:~>>> neutron port-list | grep internal_api_virtual_ip | 6ee53c74-5813-4163-a3e7-c788c30bff5c | internal_api_virtual_ip | fa:16:3e:4b:8b:eb | {"subnet_id": "2ea8bb56-7d3c-4a02-b9f5-a17533de6001", "ip_address": "172.16.20.10"} | post update: stack@instack:~>>> neutron port-list | grep internal_api_virtual_ip | 6ee53c74-5813-4163-a3e7-c788c30bff5c | internal_api_virtual_ip | fa:16:3e:4b:8b:eb | {"subnet_id": "2ea8bb56-7d3c-4a02-b9f5-a17533de6001", "ip_address": "172.16.20.10"} |
(In reply to Amit Ugol from comment #20) > We are not going to support 7.0 to 7.1 and both 7.1 and 7.0 were upgraded to > 7.2 in CI and in the lab. is it enough to close this ? I'd say it depends what kind of test you have in the CI? * On how many machines did you try? * How many were on hardware / virtual? * Did you have a ceph cluster or swift storage? * Did you have some instances running? * Tried spawning instances afterwards? * Tried accessing to already running instances afterwards? I would like to be sure the CI tests are as close as possible to clients' environments, which usually means having roughly 10 baremetal servers, instances running, instances that will need to be spawned, etc. Thanks
This bug is specific to 7.0 -> 7.1. If this issue is not reproduceable in upgrades to 7.x to 7.2 this bug can be closed. Based on Marius's comment, this is verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2650