1272357 – director stack update 7.0 to 7.1 VIP change

Bug 1272357 - director stack update 7.0 to 7.1 VIP change

Summary: director stack update 7.0 to 7.1 VIP change

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	7.0 (Kilo)
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	y2
Target Release:	7.0 (Kilo)
Assignee:	Giulio Fidente
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:	1276204 1278537
Blocks:
TreeView+	depends on / blocked

Reported:	2015-10-16 08:20 UTC by Cyril Lopez
Modified:	2015-12-21 16:52 UTC (History)
CC List:	14 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-0.8.6-84.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, the Orchestration resource implementing the overcloud VIPs changed in between the releases because of a wrong mapping in the resource registry file. This caused the overcloud VIP ports to be deleted and recreated. With this update, the overcloud VIPs are now mapped to the same Orchestration resource to which they were mapped in the previous releases. As a result, the overcloud VIP ports are updated in-place.
Clone Of:
Environment:
Last Closed:	2015-12-21 16:52:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:2650	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update	2015-12-21 21:44:54 UTC

Description Cyril Lopez 2015-10-16 08:20:09 UTC

Description of problem:
During an update stack from 7.0 to 7.1, VIP change

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch


How reproducible:
Install a overcloud 7.0 and update stack to 7.1

Steps to Reproduce:
1. Install undercloud / overcloud in 7.0
2. Update undercloud in 7.1
3. Update openstack-puppet-modules on all nodes cf https://bugzilla.redhat.com/show_bug.cgi?id=1267318
4. to update the stack, do a openstack overcloud deploy --templates /home/stack/templates-7.1/ [...]


Actual results:

At least, my internal vip change from 10.154.20.10 to 10.154.20.23 during the update.

 | VipMap                                      | d0e33d89-5c11-422f-8ed4-69bcf0a514ca          | OS::TripleO::Network::Ports::NetVipMap            | CREATE_COMPLETE | 2015-10-15T16:50:25Z |                                             |
 
 heat output-show d0e33d89-5c11-422f-8ed4-69bcf0a514ca --all
[ 
  { 
    "output_value": {
      "storage": "10.154.22.20",
      "ctlplane": "10.153.20.85",
      "external": "198.154.188.59",
      "internal_api": "10.154.20.23",
      "storage_mgmt": "10.154.23.16",
      "tenant": ""
    },
    "description": "A Hash containing a mapping of network names to assigned IPs for a specific machine.\n",
    "output_key": "net_ip_map"
  }


Expected results:
Is suppose to not change

Additional info:

Comment 2 Giulio Fidente 2015-10-16 09:06:49 UTC

This is because the VIPs are managed by a dedicated resource type in 7.1, they were not in 7.0; when upgrading heat creates the new resource (previously didn't exist) and this results in a new IP.

Also see https://bugzilla.redhat.com/show_bug.cgi?id=1272347#c2

Comment 3 Giulio Fidente 2015-10-16 11:17:50 UTC

Will try with douple mapping first, eg:

  OS::TripleO::Network::Ports::ExternalVipPort: OS::TripleO::Controller::Ports::ExternalPort

Comment 4 Giulio Fidente 2015-10-19 15:29:22 UTC

Double mapping did not work; we'll have to resort to providing the VIPs manually as input parameters. I will post an update with instructions as soon as I have it tested.

Comment 5 Giulio Fidente 2015-10-20 15:37:27 UTC

There is a workaround which worked with single network; we're testing this in network isolation too.

Steps are:

1. collect the overcloud VIPs querying the neutron undercloud *before* the update:

  $ neutron port-list

2. edit some upgrade.yaml with the following contents:

resource_registry:
  OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml
  OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml

parameter_defaults:
  ControlPlaneIP: 192.0.2.18
  ExternalNetworkVip: 192.0.2.19
  InternalApiNetworkVip: 192.0.2.18
  StorageNetworkVip: 192.0.2.18
  StorageMgmtNetworkVip: 192.0.2.18
  ServiceVips:
    redis: 192.0.2.20

when deploying without some of the networks, the non-existent VIPs can be the same of the InternalApiNetworkVip; when deploying with single network, the non-existent VIPs can be the same of ControlPlaneIP

3. perform the upgrade passing as additional argument:

  -e /path/to/upgrade.yaml

Comment 6 Giulio Fidente 2015-10-20 17:56:28 UTC

The workaround in comment #5 works in scenarios using network isolation too.

Comment 7 Steven Hardy 2015-10-22 10:22:38 UTC

I raised an upstream bug https://bugs.launchpad.net/heat/+bug/1508115

This describes some possible ways we could make heat less destructive on update, which I think would fix this problem.  I don't currently have an ETA for implementing that though, so continuing to discuss workarounds is wise.

Re the workaround in comment #5 - won't the neutron ports be deleted due to the switch to noop.yaml, e.g those statically assigned IP's could end up being re-assigned later via the neutron IPAM?

Comment 8 Giulio Fidente 2015-10-22 12:38:55 UTC

Steven hi, thanks, I will check about the neutron ports and if we can exclude those from the IP pool.

Comment 9 Giulio Fidente 2015-10-23 02:21:40 UTC

Steven, the neutron ports are indeed deleted as you suggested. We could potentially exclude the VIPs from the allocation pools with:

  ExternalAllocationPools: [{'start': '10.0.0.5', 'end': '10.0.0.250'}]
  StorageAllocationPools: [{'start': '172.16.1.5', 'end': '172.16.1.250'}]
  StorageMgmtAllocationPools: [{'start': '172.16.3.5', 'end': '172.16.3.250'}]
  InternalApiAllocationPools: [{'start': '172.16.2.6', 'end': '172.16.2.250'}]

but that won't work, emitting:

  Conflict: resources.StorageSubnet: Unable to complete operation on subnet a4922be8-5358-4bca-b2f7-a6605839d00f. One or more ports have an IP allocation from this subnet.

From heat logs it seems to be attempting a DELETE on the neutron network/subnet; I suppose that is caused by the allocation_pools parameter being updated?

Comment 10 James Slagle 2015-11-06 16:48:29 UTC

potential upstream fix: https://review.openstack.org/#/c/238194/

Comment 20 Amit Ugol 2015-12-09 13:35:18 UTC

We are not going to support 7.0 to 7.1 and both 7.1 and 7.0 were upgraded to 7.2 in CI and in the lab. is it enough to close this ?

Comment 21 Marius Cornea 2015-12-16 03:17:14 UTC

pre update:
stack@instack:~>>> neutron port-list | grep internal_api_virtual_ip
| 6ee53c74-5813-4163-a3e7-c788c30bff5c | internal_api_virtual_ip       | fa:16:3e:4b:8b:eb | {"subnet_id": "2ea8bb56-7d3c-4a02-b9f5-a17533de6001", "ip_address": "172.16.20.10"} |

post update:
stack@instack:~>>> neutron port-list | grep internal_api_virtual_ip
| 6ee53c74-5813-4163-a3e7-c788c30bff5c | internal_api_virtual_ip       | fa:16:3e:4b:8b:eb | {"subnet_id": "2ea8bb56-7d3c-4a02-b9f5-a17533de6001", "ip_address": "172.16.20.10"} |

Comment 22 hrosnet 2015-12-16 08:31:19 UTC

(In reply to Amit Ugol from comment #20)
> We are not going to support 7.0 to 7.1 and both 7.1 and 7.0 were upgraded to
> 7.2 in CI and in the lab. is it enough to close this ?

I'd say it depends what kind of test you have in the CI?
* On how many machines did you try?
* How many were on hardware / virtual?
* Did you have a ceph cluster or swift storage?
* Did you have some instances running?
* Tried spawning instances afterwards?
* Tried accessing to already running instances afterwards?

I would like to be sure the CI tests are as close as possible to clients' environments, which usually means having roughly 10 baremetal servers, instances running, instances that will need to be spawned, etc.

Thanks

Comment 23 Jaromir Coufal 2015-12-17 10:11:37 UTC

This bug is specific to 7.0 -> 7.1. If this issue is not reproduceable in upgrades to 7.x to 7.2 this bug can be closed. Based on Marius's comment, this is verified.

Comment 25 errata-xmlrpc 2015-12-21 16:52:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650

Note You need to log in before you can comment on or make changes to this bug.