Bug 1269285 - [Docs][Director]Overcloud Updates failed with network isolation
[Docs][Director]Overcloud Updates failed with network isolation
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
urgent Severity unspecified
: ga
: 8.0 (Liberty)
Assigned To: RHOS Documentation Team
RHOS Documentation Team
: Automation, Documentation, Triaged
Depends On: 1276204
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-06 17:08 EDT by mathieu bultel
Modified: 2018-01-07 05:50 EST (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1318397 (view as bug list)
Environment:
Last Closed: 2016-11-18 17:21:46 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
OSP 7 GA templates (75.14 KB, application/x-gzip)
2015-10-15 16:31 EDT, Ryan Brown
no flags Details
netconfigs for puddle deploy (35.94 KB, application/x-gzip)
2015-10-15 16:32 EDT, Ryan Brown
no flags Details
netconfigs for GA (35.90 KB, application/x-gzip)
2015-10-15 16:33 EDT, Ryan Brown
no flags Details
templates from latest puddle (93.35 KB, application/x-gzip)
2015-10-15 16:34 EDT, Ryan Brown
no flags Details

  None (edit)
Description mathieu bultel 2015-10-06 17:08:18 EDT
Description of problem:

When update overcloud with network isolation option, the updates failed with the error:

openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e ~/network-environment.yaml

WARNING: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud There are 3 ironic nodes with no profile that will not be used: 8bbce96c-9d70-43c2-87c4-4b4fad44ddcf, 27583213-463c-4176-8256-36747da0df23, 9d9f3354-5d63-4e40-9ad0-aa053aa97b71

ERROR: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud Configuration has 1 warnings, fix them before proceeding.

Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates
Stack failed with status: resources.Networks: resources.TenantNetwork: Conflict: resources.TenantSubnet: Unable to complete operation on subnet 1b29f2db-01ba-438b-b4d4-3453dec411ff. One or more ports have an IP allocation from this subnet.

ERROR: openstack Heat Stack update failed.


The update can continue if we delete all ports on the deployment.
Comment 2 Zane Bitter 2015-10-06 17:57:17 EDT
It sounds very much like we're modifying on of the Subnets in such a way as to cause it to be replaced, which is something we need to not do on upgrade.
Comment 3 chris alfonso 2015-10-09 12:19:28 EDT
Can you please note in a comment what the pre-update subnet was and what you're changing it to?
Comment 5 Ryan Brown 2015-10-15 16:30:09 EDT
I was able to repro this bug, and I've attached the new & old revisions of the templates for the GA and the latest puddle. 

Stack failed with status: resources.Networks: resources.TenantNetwork: Conflict: resources.TenantSubnet: Unable to complete operation on subnet 50a7023c-43ce-
42dd-aefe-04612f281f58. One or more ports have an IP allocation from this subnet.

The difference in the network templates is here http://pastebin.test.redhat.com/320385 because in the older version it wasn't necessary to include the provisioning nic in the configuration.
Comment 6 Ryan Brown 2015-10-15 16:31 EDT
Created attachment 1083398 [details]
OSP 7 GA templates
Comment 7 Ryan Brown 2015-10-15 16:32 EDT
Created attachment 1083399 [details]
netconfigs for puddle deploy
Comment 8 Ryan Brown 2015-10-15 16:33 EDT
Created attachment 1083401 [details]
netconfigs for GA
Comment 9 Ryan Brown 2015-10-15 16:34 EDT
Created attachment 1083402 [details]
templates from latest puddle
Comment 10 Ryan Brown 2015-10-16 16:57:09 EDT
So far I've been able to reproduce, but only when running a deploy like: 

openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml

Which includes the overcloud-resource-registry.yaml *after* the network-isolation.yaml, this overrides the subnet information and networks as NOOPs, causing Heat to try to delete the network. I avoided this problem by changing my environment file order, as such: 

openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com

Note that the network-iso and network-env are last. This may (partially) be a docs fix, but we may be able to do something in code instead.
Comment 11 Ryan Brown 2015-10-20 14:47:10 EDT
After further testing, I don't think we can make any changes to fix this. Perhaps the best solution would be some additional upgrade documentation about ensuring the *order* is identical for all environment files.
Comment 12 chris alfonso 2015-11-10 12:42:45 EST
Dan, we need to make sure we add docs that specify the order of the environment files must be identical to the previous deployment.
Comment 13 mathieu bultel 2015-11-10 12:49:22 EST
I have tried to update my deployment with this command:

openstack overcloud update stack -i overcloud --debug --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml -e  /home/stack/upgrade.yml

So the net-iso templates are at the end of the command line, but it seems that the templates are ignore by the updates. The update still tried to override the subnets, and failed because it can't be delete since there are ports associate to them.

Still trying to figure out this issue...
Comment 14 Giulio Fidente 2015-11-12 08:51:27 EST
hi Mathieu, can you attach the contents of ~/network-environment.yaml and ~/upgrade.yaml?
Comment 15 mathieu bultel 2015-11-12 09:01:16 EST
the network-isolation.yaml:

parameter_defaults:
  InternalApiNetCidr: 172.16.20.0/24
  StorageNetCidr: 172.16.21.0/24
  TenantNetCidr: 172.16.22.0/24
  ExternalNetCidr: 172.16.23.0/24
  InternalApiAllocationPools: [{'start': '172.16.20.10', 'end': '172.16.20.100'}]
  StorageAllocationPools: [{'start': '172.16.21.10', 'end': '172.16.21.100'}]
  TenantAllocationPools: [{'start': '172.16.22.10', 'end': '172.16.22.100'}]
  ExternalAllocationPools: [{'start': '172.16.23.10', 'end': '172.16.23.100'}]
  ExternalInterfaceDefaultRoute: 172.16.23.251
  NeutronExternalNetworkBridge: "''"
  ControlPlaneSubnetCidr: "24"
  ControlPlaneDefaultRoute: 192.0.2.1
  EC2MetadataIp: 192.0.2.1


And the upgrade.yml:

parameters:
    ServiceNetMap:
        NeutronTenantNetwork: tenant
        CeilometerApiNetwork: internal_api
        MongoDbNetwork: internal_api
        CinderApiNetwork: internal_api
        CinderIscsiNetwork: storage
        GlanceApiNetwork: storage
        GlanceRegistryNetwork: internal_api
        KeystoneAdminApiNetwork: internal_api
        KeystonePublicApiNetwork: internal_api
        NeutronApiNetwork: internal_api
        HeatApiNetwork: internal_api
        NovaApiNetwork: internal_api
        NovaMetadataNetwork: internal_api
        NovaVncProxyNetwork: internal_api
        SwiftMgmtNetwork: storage_mgmt
        SwiftProxyNetwork: storage
        HorizonNetwork: internal_api
        MemcachedNetwork: internal_api
        RabbitMqNetwork: internal_api
        RedisNetwork: internal_api
        MysqlNetwork: internal_api
        CephClusterNetwork: storage_mgmt
        CephPublicNetwork: storage
        ControllerHostnameResolveNetwork: internal_api
        ComputeHostnameResolveNetwork: internal_api
        BlockStorageHostnameResolveNetwork: internal_api
        ObjectStorageHostnameResolveNetwork: internal_api
        CephStorageHostnameResolveNetwork: storage

resource_registry:
    OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml
    OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml

parameter_defaults:
    ControlPlaneIP: {{  control_virtual_ip.stdout }}
    ExternalNetworkVip: {{  public_virtual_ip.stdout }}
    InternalApiNetworkVip: {{  internal_api_virtual_ip.stdout }}
    StorageNetworkVip: {{  storage_virtual_ip.stdout }}
    StorageMgmtNetworkVip: {{  storage_management_virtual_ip.stdout }}
    ServiceVips:
        redis: {{ redis_virtual_ip.stdout }}



Note that {{ redis_virtual_ip.stdout }} value, correspond to the ip of the output of neutron port-list

I'm re deploying 2 news env (the previous was broken).
Comment 16 mathieu bultel 2015-11-12 09:03:04 EST
So with this network-isolation custom configuration, heat try to apply the default network configuration, so it tries to remove the subnets on the current deployment and failed because there is port attached to them.
Comment 17 Giulio Fidente 2015-11-12 09:14:10 EST
Mathieu, I think we found the culprit. To update the subnets allocation pool without deleting/recreating the subnet we need to land: https://bugzilla.redhat.com/show_bug.cgi?id=1276204
Comment 18 Giulio Fidente 2015-11-12 09:20:50 EST
Mathieu, I just realized bz #1276204 is in MODIFIED state, can you check which version of openstack-heat is installed on the undercloud before updating? Also, is the ~/network-environment.yaml changed in between the initial deployment and the update attempt?
Comment 19 mathieu bultel 2015-11-12 09:27:17 EST
No, the ~/network-environment.yaml that I've pasted you is the initial file and the one I used for the updates.
Does I need to change something there ?
Comment 20 Giulio Fidente 2015-11-12 14:30:59 EST
From the error message you posted it seems that Heat is trying to delete/recreate the subnet, which is what it would do before the fix for bz #1276204 if you were to update the allocation pools; can you check if this reproduces with openstack-heat-2015.1.1-8.el7ost ?
Comment 21 mathieu bultel 2015-11-12 14:39:14 EST
It seems that it reproduce  with openstack-heat-2015.1.1-8.el7ost.

What I have updated :

sudo yum localinstall -y openstack-heat-api-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cfn-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cloudwatch-2015.1.1-8.el7ost.noarch.rpm openstack-heat-common-2015.1.1-8.el7ost.noarch.rpm openstack-heat-engine-2015.1.1-8.el7ost.noarch.rpm
Comment 22 mathieu bultel 2015-11-16 10:19:35 EST
Still the same with the latest poodle:
openstack-heat version 1-2.1
Comment 23 Mike Burns 2016-01-20 12:49:41 EST
Can you please retest this?

Note You need to log in before you can comment on or make changes to this bug.