Bug 1272357 - director stack update 7.0 to 7.1 VIP change
Summary: director stack update 7.0 to 7.1 VIP change
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: y2
: 7.0 (Kilo)
Assignee: Giulio Fidente
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On: 1276204 1278537
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-16 08:20 UTC by Cyril Lopez
Modified: 2015-12-21 16:52 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-84.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, the Orchestration resource implementing the overcloud VIPs changed in between the releases because of a wrong mapping in the resource registry file. This caused the overcloud VIP ports to be deleted and recreated. With this update, the overcloud VIPs are now mapped to the same Orchestration resource to which they were mapped in the previous releases. As a result, the overcloud VIP ports are updated in-place.
Clone Of:
Environment:
Last Closed: 2015-12-21 16:52:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:2650 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update 2015-12-21 21:44:54 UTC

Description Cyril Lopez 2015-10-16 08:20:09 UTC
Description of problem:
During an update stack from 7.0 to 7.1, VIP change

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch


How reproducible:
Install a overcloud 7.0 and update stack to 7.1

Steps to Reproduce:
1. Install undercloud / overcloud in 7.0
2. Update undercloud in 7.1
3. Update openstack-puppet-modules on all nodes cf https://bugzilla.redhat.com/show_bug.cgi?id=1267318
4. to update the stack, do a openstack overcloud deploy --templates /home/stack/templates-7.1/ [...]


Actual results:

At least, my internal vip change from 10.154.20.10 to 10.154.20.23 during the update.

 | VipMap                                      | d0e33d89-5c11-422f-8ed4-69bcf0a514ca          | OS::TripleO::Network::Ports::NetVipMap            | CREATE_COMPLETE | 2015-10-15T16:50:25Z |                                             |
 
 heat output-show d0e33d89-5c11-422f-8ed4-69bcf0a514ca --all
[ 
  { 
    "output_value": {
      "storage": "10.154.22.20",
      "ctlplane": "10.153.20.85",
      "external": "198.154.188.59",
      "internal_api": "10.154.20.23",
      "storage_mgmt": "10.154.23.16",
      "tenant": ""
    },
    "description": "A Hash containing a mapping of network names to assigned IPs for a specific machine.\n",
    "output_key": "net_ip_map"
  }


Expected results:
Is suppose to not change

Additional info:

Comment 2 Giulio Fidente 2015-10-16 09:06:49 UTC
This is because the VIPs are managed by a dedicated resource type in 7.1, they were not in 7.0; when upgrading heat creates the new resource (previously didn't exist) and this results in a new IP.

Also see https://bugzilla.redhat.com/show_bug.cgi?id=1272347#c2

Comment 3 Giulio Fidente 2015-10-16 11:17:50 UTC
Will try with douple mapping first, eg:

  OS::TripleO::Network::Ports::ExternalVipPort: OS::TripleO::Controller::Ports::ExternalPort

Comment 4 Giulio Fidente 2015-10-19 15:29:22 UTC
Double mapping did not work; we'll have to resort to providing the VIPs manually as input parameters. I will post an update with instructions as soon as I have it tested.

Comment 5 Giulio Fidente 2015-10-20 15:37:27 UTC
There is a workaround which worked with single network; we're testing this in network isolation too.

Steps are:

1. collect the overcloud VIPs querying the neutron undercloud *before* the update:

  $ neutron port-list

2. edit some upgrade.yaml with the following contents:

resource_registry:
  OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml
  OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
  OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml

parameter_defaults:
  ControlPlaneIP: 192.0.2.18
  ExternalNetworkVip: 192.0.2.19
  InternalApiNetworkVip: 192.0.2.18
  StorageNetworkVip: 192.0.2.18
  StorageMgmtNetworkVip: 192.0.2.18
  ServiceVips:
    redis: 192.0.2.20

when deploying without some of the networks, the non-existent VIPs can be the same of the InternalApiNetworkVip; when deploying with single network, the non-existent VIPs can be the same of ControlPlaneIP

3. perform the upgrade passing as additional argument:

  -e /path/to/upgrade.yaml

Comment 6 Giulio Fidente 2015-10-20 17:56:28 UTC
The workaround in comment #5 works in scenarios using network isolation too.

Comment 7 Steven Hardy 2015-10-22 10:22:38 UTC
I raised an upstream bug https://bugs.launchpad.net/heat/+bug/1508115

This describes some possible ways we could make heat less destructive on update, which I think would fix this problem.  I don't currently have an ETA for implementing that though, so continuing to discuss workarounds is wise.

Re the workaround in comment #5 - won't the neutron ports be deleted due to the switch to noop.yaml, e.g those statically assigned IP's could end up being re-assigned later via the neutron IPAM?

Comment 8 Giulio Fidente 2015-10-22 12:38:55 UTC
Steven hi, thanks, I will check about the neutron ports and if we can exclude those from the IP pool.

Comment 9 Giulio Fidente 2015-10-23 02:21:40 UTC
Steven, the neutron ports are indeed deleted as you suggested. We could potentially exclude the VIPs from the allocation pools with:

  ExternalAllocationPools: [{'start': '10.0.0.5', 'end': '10.0.0.250'}]
  StorageAllocationPools: [{'start': '172.16.1.5', 'end': '172.16.1.250'}]
  StorageMgmtAllocationPools: [{'start': '172.16.3.5', 'end': '172.16.3.250'}]
  InternalApiAllocationPools: [{'start': '172.16.2.6', 'end': '172.16.2.250'}]

but that won't work, emitting:

  Conflict: resources.StorageSubnet: Unable to complete operation on subnet a4922be8-5358-4bca-b2f7-a6605839d00f. One or more ports have an IP allocation from this subnet.

From heat logs it seems to be attempting a DELETE on the neutron network/subnet; I suppose that is caused by the allocation_pools parameter being updated?

Comment 10 James Slagle 2015-11-06 16:48:29 UTC
potential upstream fix: https://review.openstack.org/#/c/238194/

Comment 20 Amit Ugol 2015-12-09 13:35:18 UTC
We are not going to support 7.0 to 7.1 and both 7.1 and 7.0 were upgraded to 7.2 in CI and in the lab. is it enough to close this ?

Comment 21 Marius Cornea 2015-12-16 03:17:14 UTC
pre update:
stack@instack:~>>> neutron port-list | grep internal_api_virtual_ip
| 6ee53c74-5813-4163-a3e7-c788c30bff5c | internal_api_virtual_ip       | fa:16:3e:4b:8b:eb | {"subnet_id": "2ea8bb56-7d3c-4a02-b9f5-a17533de6001", "ip_address": "172.16.20.10"} |

post update:
stack@instack:~>>> neutron port-list | grep internal_api_virtual_ip
| 6ee53c74-5813-4163-a3e7-c788c30bff5c | internal_api_virtual_ip       | fa:16:3e:4b:8b:eb | {"subnet_id": "2ea8bb56-7d3c-4a02-b9f5-a17533de6001", "ip_address": "172.16.20.10"} |

Comment 22 hrosnet 2015-12-16 08:31:19 UTC
(In reply to Amit Ugol from comment #20)
> We are not going to support 7.0 to 7.1 and both 7.1 and 7.0 were upgraded to
> 7.2 in CI and in the lab. is it enough to close this ?

I'd say it depends what kind of test you have in the CI?
* On how many machines did you try?
* How many were on hardware / virtual?
* Did you have a ceph cluster or swift storage?
* Did you have some instances running?
* Tried spawning instances afterwards?
* Tried accessing to already running instances afterwards?

I would like to be sure the CI tests are as close as possible to clients' environments, which usually means having roughly 10 baremetal servers, instances running, instances that will need to be spawned, etc.

Thanks

Comment 23 Jaromir Coufal 2015-12-17 10:11:37 UTC
This bug is specific to 7.0 -> 7.1. If this issue is not reproduceable in upgrades to 7.x to 7.2 this bug can be closed. Based on Marius's comment, this is verified.

Comment 25 errata-xmlrpc 2015-12-21 16:52:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650


Note You need to log in before you can comment on or make changes to this bug.