Bug 1269285

Summary: [Docs][Director]Overcloud Updates failed with network isolation
Product: Red Hat OpenStack Reporter: mathieu bultel <mbultel>
Component: documentationAssignee: RHOS Documentation Team <rhos-docs>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: arkady_kanevsky, augol, gchenuet, gfidente, hbrock, jcoufal, jslagle, lbopf, mbultel, mburns, mcornea, ohochman, randy_perryman, rhel-osp-director-maint, rybrown, srevivo, wayne_allen, whayutin, zbitter
Target Milestone: gaKeywords: Automation, Documentation, Triaged
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1318397 (view as bug list) Environment:
Last Closed: 2016-11-18 22:21:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1276204    
Bug Blocks:    
Attachments:
Description Flags
OSP 7 GA templates
none
netconfigs for puddle deploy
none
netconfigs for GA
none
templates from latest puddle none

Description mathieu bultel 2015-10-06 21:08:18 UTC
Description of problem:

When update overcloud with network isolation option, the updates failed with the error:

openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e ~/network-environment.yaml

WARNING: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud There are 3 ironic nodes with no profile that will not be used: 8bbce96c-9d70-43c2-87c4-4b4fad44ddcf, 27583213-463c-4176-8256-36747da0df23, 9d9f3354-5d63-4e40-9ad0-aa053aa97b71

ERROR: rdomanager_oscplugin.v1.overcloud_deploy.DeployOvercloud Configuration has 1 warnings, fix them before proceeding.

Deploying templates in the directory /usr/share/openstack-tripleo-heat-templates
Stack failed with status: resources.Networks: resources.TenantNetwork: Conflict: resources.TenantSubnet: Unable to complete operation on subnet 1b29f2db-01ba-438b-b4d4-3453dec411ff. One or more ports have an IP allocation from this subnet.

ERROR: openstack Heat Stack update failed.


The update can continue if we delete all ports on the deployment.

Comment 2 Zane Bitter 2015-10-06 21:57:17 UTC
It sounds very much like we're modifying on of the Subnets in such a way as to cause it to be replaced, which is something we need to not do on upgrade.

Comment 3 chris alfonso 2015-10-09 16:19:28 UTC
Can you please note in a comment what the pre-update subnet was and what you're changing it to?

Comment 5 Ryan Brown 2015-10-15 20:30:09 UTC
I was able to repro this bug, and I've attached the new & old revisions of the templates for the GA and the latest puddle. 

Stack failed with status: resources.Networks: resources.TenantNetwork: Conflict: resources.TenantSubnet: Unable to complete operation on subnet 50a7023c-43ce-
42dd-aefe-04612f281f58. One or more ports have an IP allocation from this subnet.

The difference in the network templates is here http://pastebin.test.redhat.com/320385 because in the older version it wasn't necessary to include the provisioning nic in the configuration.

Comment 6 Ryan Brown 2015-10-15 20:31:51 UTC
Created attachment 1083398 [details]
OSP 7 GA templates

Comment 7 Ryan Brown 2015-10-15 20:32:25 UTC
Created attachment 1083399 [details]
netconfigs for puddle deploy

Comment 8 Ryan Brown 2015-10-15 20:33:24 UTC
Created attachment 1083401 [details]
netconfigs for GA

Comment 9 Ryan Brown 2015-10-15 20:34:01 UTC
Created attachment 1083402 [details]
templates from latest puddle

Comment 10 Ryan Brown 2015-10-16 20:57:09 UTC
So far I've been able to reproduce, but only when running a deploy like: 

openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml

Which includes the overcloud-resource-registry.yaml *after* the network-isolation.yaml, this overrides the subnet information and networks as NOOPs, causing Heat to try to delete the network. I avoided this problem by changing my environment file order, as such: 

openstack --debug overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml --ntp-server clock.redhat.com

Note that the network-iso and network-env are last. This may (partially) be a docs fix, but we may be able to do something in code instead.

Comment 11 Ryan Brown 2015-10-20 18:47:10 UTC
After further testing, I don't think we can make any changes to fix this. Perhaps the best solution would be some additional upgrade documentation about ensuring the *order* is identical for all environment files.

Comment 12 chris alfonso 2015-11-10 17:42:45 UTC
Dan, we need to make sure we add docs that specify the order of the environment files must be identical to the previous deployment.

Comment 13 mathieu bultel 2015-11-10 17:49:22 UTC
I have tried to update my deployment with this command:

openstack overcloud update stack -i overcloud --debug --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml -e  /home/stack/upgrade.yml

So the net-iso templates are at the end of the command line, but it seems that the templates are ignore by the updates. The update still tried to override the subnets, and failed because it can't be delete since there are ports associate to them.

Still trying to figure out this issue...

Comment 14 Giulio Fidente 2015-11-12 13:51:27 UTC
hi Mathieu, can you attach the contents of ~/network-environment.yaml and ~/upgrade.yaml?

Comment 15 mathieu bultel 2015-11-12 14:01:16 UTC
the network-isolation.yaml:

parameter_defaults:
  InternalApiNetCidr: 172.16.20.0/24
  StorageNetCidr: 172.16.21.0/24
  TenantNetCidr: 172.16.22.0/24
  ExternalNetCidr: 172.16.23.0/24
  InternalApiAllocationPools: [{'start': '172.16.20.10', 'end': '172.16.20.100'}]
  StorageAllocationPools: [{'start': '172.16.21.10', 'end': '172.16.21.100'}]
  TenantAllocationPools: [{'start': '172.16.22.10', 'end': '172.16.22.100'}]
  ExternalAllocationPools: [{'start': '172.16.23.10', 'end': '172.16.23.100'}]
  ExternalInterfaceDefaultRoute: 172.16.23.251
  NeutronExternalNetworkBridge: "''"
  ControlPlaneSubnetCidr: "24"
  ControlPlaneDefaultRoute: 192.0.2.1
  EC2MetadataIp: 192.0.2.1


And the upgrade.yml:

parameters:
    ServiceNetMap:
        NeutronTenantNetwork: tenant
        CeilometerApiNetwork: internal_api
        MongoDbNetwork: internal_api
        CinderApiNetwork: internal_api
        CinderIscsiNetwork: storage
        GlanceApiNetwork: storage
        GlanceRegistryNetwork: internal_api
        KeystoneAdminApiNetwork: internal_api
        KeystonePublicApiNetwork: internal_api
        NeutronApiNetwork: internal_api
        HeatApiNetwork: internal_api
        NovaApiNetwork: internal_api
        NovaMetadataNetwork: internal_api
        NovaVncProxyNetwork: internal_api
        SwiftMgmtNetwork: storage_mgmt
        SwiftProxyNetwork: storage
        HorizonNetwork: internal_api
        MemcachedNetwork: internal_api
        RabbitMqNetwork: internal_api
        RedisNetwork: internal_api
        MysqlNetwork: internal_api
        CephClusterNetwork: storage_mgmt
        CephPublicNetwork: storage
        ControllerHostnameResolveNetwork: internal_api
        ComputeHostnameResolveNetwork: internal_api
        BlockStorageHostnameResolveNetwork: internal_api
        ObjectStorageHostnameResolveNetwork: internal_api
        CephStorageHostnameResolveNetwork: storage

resource_registry:
    OS::TripleO::Network::Ports::NetVipMap: /usr/share/openstack-tripleo-heat-templates/network/ports/net_vip_map_external.yaml
    OS::TripleO::Network::Ports::CtlplaneVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::StorageMgmtVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::TenantVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
    OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/from_service.yaml

parameter_defaults:
    ControlPlaneIP: {{  control_virtual_ip.stdout }}
    ExternalNetworkVip: {{  public_virtual_ip.stdout }}
    InternalApiNetworkVip: {{  internal_api_virtual_ip.stdout }}
    StorageNetworkVip: {{  storage_virtual_ip.stdout }}
    StorageMgmtNetworkVip: {{  storage_management_virtual_ip.stdout }}
    ServiceVips:
        redis: {{ redis_virtual_ip.stdout }}



Note that {{ redis_virtual_ip.stdout }} value, correspond to the ip of the output of neutron port-list

I'm re deploying 2 news env (the previous was broken).

Comment 16 mathieu bultel 2015-11-12 14:03:04 UTC
So with this network-isolation custom configuration, heat try to apply the default network configuration, so it tries to remove the subnets on the current deployment and failed because there is port attached to them.

Comment 17 Giulio Fidente 2015-11-12 14:14:10 UTC
Mathieu, I think we found the culprit. To update the subnets allocation pool without deleting/recreating the subnet we need to land: https://bugzilla.redhat.com/show_bug.cgi?id=1276204

Comment 18 Giulio Fidente 2015-11-12 14:20:50 UTC
Mathieu, I just realized bz #1276204 is in MODIFIED state, can you check which version of openstack-heat is installed on the undercloud before updating? Also, is the ~/network-environment.yaml changed in between the initial deployment and the update attempt?

Comment 19 mathieu bultel 2015-11-12 14:27:17 UTC
No, the ~/network-environment.yaml that I've pasted you is the initial file and the one I used for the updates.
Does I need to change something there ?

Comment 20 Giulio Fidente 2015-11-12 19:30:59 UTC
From the error message you posted it seems that Heat is trying to delete/recreate the subnet, which is what it would do before the fix for bz #1276204 if you were to update the allocation pools; can you check if this reproduces with openstack-heat-2015.1.1-8.el7ost ?

Comment 21 mathieu bultel 2015-11-12 19:39:14 UTC
It seems that it reproduce  with openstack-heat-2015.1.1-8.el7ost.

What I have updated :

sudo yum localinstall -y openstack-heat-api-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cfn-2015.1.1-8.el7ost.noarch.rpm openstack-heat-api-cloudwatch-2015.1.1-8.el7ost.noarch.rpm openstack-heat-common-2015.1.1-8.el7ost.noarch.rpm openstack-heat-engine-2015.1.1-8.el7ost.noarch.rpm

Comment 22 mathieu bultel 2015-11-16 15:19:35 UTC
Still the same with the latest poodle:
openstack-heat version 1-2.1

Comment 23 Mike Burns 2016-01-20 17:49:41 UTC
Can you please retest this?