Bug 1483246 - Failing deployment "Unable to create the flat network. Physical network tenant is in use."
Summary: Failing deployment "Unable to create the flat network. Physical network tenan...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: 10.0 (Newton)
Assignee: Emilien Macchi
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks: 1672280 1697985
TreeView+ depends on / blocked
 
Reported: 2017-08-19 15:43 UTC by Robin Cernin
Modified: 2022-03-13 14:24 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1672280 1697985 (view as bug list)
Environment:
Last Closed: 2019-04-10 13:02:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11313 0 None None None 2021-12-10 15:24:56 UTC

Description Robin Cernin 2017-08-19 15:43:17 UTC
With the initial templates/scripts we are failing to update the stack because it tries to create Tenant network, Internal API network while those are already created and fails with: "Unable to create the flat network. Physical network tenant is in use."

Here is the Tenant network that was created during the initial deployment:

[stack@undercloud ~]$ neutron net-show 7f7b9b09-b268-4a69-9a68-ee568b93d45d
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | False                                |
| availability_zone_hints   |                                      |
| availability_zones        |                                      |
| created_at                | 2017-07-12T12:55:13Z                 |
| description               |                                      |
| id                        | 7f7b9b09-b268-4a69-9a68-ee568b93d45d |
| ipv4_address_scope        |                                      |
| ipv6_address_scope        |                                      |
| mtu                       | 1500                                 |
| name                      | tenant                               |
| project_id                | 230c24dced0c4cdfa9ec68555ad7557e     |
| provider:network_type     | flat                                 |
| provider:physical_network | tenant                               |
| provider:segmentation_id  |                                      |
| revision_number           | 4                                    |
| router:external           | False                                |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | 190e7d09-553f-407c-abd5-35870706b424 |
| tags                      |                                      |
| tenant_id                 | 230c24dced0c4cdfa9ec68555ad7557e     |
| updated_at                | 2017-07-12T12:55:13Z                 |
+---------------------------+--------------------------------------+

And the subnet:

[stack@undercloud ~]$ neutron subnet-show 190e7d09-553f-407c-abd5-35870706b424
+-------------------+---------------------------------------------------+
| Field             | Value                                             |
+-------------------+---------------------------------------------------+
| allocation_pools  | {"start": "192.168.4.10", "end": "192.168.4.200"} |
| cidr              | 192.168.4.0/24                                    |
| created_at        | 2017-07-12T12:55:13Z                              |
| description       |                                                   |
| dns_nameservers   |                                                   |
| enable_dhcp       | False                                             |
| gateway_ip        |                                                   |
| host_routes       |                                                   |
| id                | 190e7d09-553f-407c-abd5-35870706b424              |
| ip_version        | 4                                                 |
| ipv6_address_mode |                                                   |
| ipv6_ra_mode      |                                                   |
| name              | tenant_subnet                                     |
| network_id        | 7f7b9b09-b268-4a69-9a68-ee568b93d45d              |
| project_id        | 230c24dced0c4cdfa9ec68555ad7557e                  |
| revision_number   | 2                                                 |
| service_types     |                                                   |
| subnetpool_id     |                                                   |
| tenant_id         | 230c24dced0c4cdfa9ec68555ad7557e                  |
| updated_at        | 2017-07-12T12:55:13Z                              |
+-------------------+---------------------------------------------------+

Now during the deployment using the scripts/templates that we used for deployment Heat complains that it can't create TenantNetwork, however TenantNetwork is already created.

2017-08-19 14:40:11Z [overcloud.Networks]: UPDATE_FAILED  resources.Networks: reources.TenantNetwork: Conflict: resources.TenantNetwork: Unable to create the flt network. Physical network tenant is in use.
Neutron server returns request_ids: ['req-419b306b-1259-4c9b-928c-5f1c24863393']
2017-08-19 14:40:11Z [overcloud]: UPDATE_FAILED  resources.Networks: resources.TnantNetwork: Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use.
Neutron server returns request_ids: ['req-419b306b-1259-4c9b-928c-5f1c24863393']

 grep -i tenant  network-environment.yaml
  TenantNetCidr: 192.168.4.0/24
  TenantAllocationPools: [{'start': '192.168.4.10', 'end': '192.168.4.200'}]
  TenantNetworkVlanID: 104

[root@undercloud ~]# grep "req-419b306b-1259-4c9b-928c-5f1c24863393" /var/log/neutron/*
/var/log/neutron/server.log:2017-08-19 14:40:09.539 6929 DEBUG oslo_messaging._drivers.amqpdriver [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] CAST unique_id: 0b5b7f18868343b99b70ac72b2d7ec28 NOTIFY exchange 'neutron' topic 'notifications.info' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:432
/var/log/neutron/server.log:2017-08-19 14:40:09.540 6929 DEBUG oslo_messaging._drivers.amqpdriver [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] CAST unique_id: c0d036ee263245be82f58a4695713c64 NOTIFY exchange 'neutron' topic 'notifications.info' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:432
/var/log/neutron/server.log:2017-08-19 14:40:09.541 6929 DEBUG neutron.api.v2.base [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] Request body: {u'network': {u'shared': False, u'admin_state_up': False, u'name': u'tenant', u'provider:physical_network': u'tenant', u'provider:network_type': u'flat'}} prepare_request_body /usr/lib/python2.7/site-packages/neutron/api/v2/base.py:684
/var/log/neutron/server.log:2017-08-19 14:40:09.544 6929 DEBUG neutron.db.quota.driver [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] Resources subnetpool,port have unlimited quota limit. It is not required to calculate headroom  make_reservation /usr/lib/python2.7/site-packages/neutron/db/quota/driver.py:190
/var/log/neutron/server.log:2017-08-19 14:40:09.547 6929 DEBUG neutron.quota.resource [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] Usage tracker for resource:network and tenant:230c24dced0c4cdfa9ec68555ad7557e is out of sync, need to count used quota count /usr/lib/python2.7/site-packages/neutron/quota/resource.py:270
/var/log/neutron/server.log:2017-08-19 14:40:09.550 6929 DEBUG neutron.quota.resource [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] Quota usage for network was recalculated. Used quota:6. count /usr/lib/python2.7/site-packages/neutron/quota/resource.py:289
/var/log/neutron/server.log:2017-08-19 14:40:09.552 6929 DEBUG neutron.db.quota.driver [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] Attempting to reserve 1 items for resource network. Total usage: 7; quota limit: 10; headroom:3 make_reservation /usr/lib/python2.7/site-packages/neutron/db/quota/driver.py:222
/var/log/neutron/server.log:2017-08-19 14:40:09.603 6929 DEBUG neutron.plugins.ml2.drivers.type_flat [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] Reserving flat network on physical network tenant reserve_provider_segment /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/type_flat.py:106
/var/log/neutron/server.log:2017-08-19 14:40:09.643 6929 INFO neutron.api.v2.resource [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] create failed (client error): There was a conflict when trying to complete your request.
/var/log/neutron/server.log:2017-08-19 14:40:09.644 6929 INFO neutron.wsgi [req-419b306b-1259-4c9b-928c-5f1c24863393 b0772c0c215743c7b5a7957d93393e43 230c24dced0c4cdfa9ec68555ad7557e - - -] 192.168.0.1 - - [19/Aug/2017 14:40:09] "POST /v2.0/networks.json HTTP/1.1" 409 349 0.107384

We tried to bypass this by forcing the network update adding:

NetworkDeploymentActions: ['CREATE','UPDATE']

Updating network configuration after a deployment With Red Hat OpenStack Platform Director 
https://access.redhat.com/solutions/2213711

However the update failed with the same issue.

Comment 15 bigswitch 2018-03-09 00:41:04 UTC
We tried the same workaround adding in network-enviorment.yaml 

NetworkDeploymentActions: ['CREATE','UPDATE'] 

Still same issue where same network is trying to create ..

We also tried adding following

NetworkDeploymentActions: ['CREATE']

Still the same issue errors out with network exists.

Comment 19 Bob Fournier 2018-10-17 14:37:19 UTC
Some things I see from the templates and deploy command...

They aren't including network-isolation.yaml in the deployment but have included the bulk of the contents from network-isolation.yaml in network-environment.yaml file:
   OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml
  OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml

*** This section is normally in network-isolation.yaml
  OS::TripleO::Network::External: /usr/share/openstack-tripleo-heat-templates/network/external.yaml
  OS::TripleO::Network::InternalApi: /usr/share/openstack-tripleo-heat-templates/network/internal_api.yaml
  OS::TripleO::Network::Storage: /usr/share/openstack-tripleo-heat-templates/network/storage.yaml
...

This breaks the normal inclusion of network-isolation.yaml first and network-environment.yaml but as such is probably not a problem.  The main issue is that its not possible to tell if they did a deployment first without network-isolation.yaml and then modified network-environment.yaml at a later date to include the settings for OS::TripleO::Network::External.  Based on the error messages such as these, its likely that at one point the deployment did not include the isolated network definitions:   
Conflict: resources.ExternalNetwork: Unable to create the flat network. Physical network external is in use.

This update failure message also indicates there has been a change to the VipPort network definition between deployments, most likely from non-isolated to isolated.
NotSupported: resources.VipPort: Update to properties network of VipPort (OS::Neutron::Port) is not supported.
overcloud.StorageMgmtVirtualIP:
  resource_type: OS::TripleO::Network::Ports::StorageMgmtVipPort
  physical_resource_id: 286d8922-2aaa-4db4-8542-a3cfd612d735
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted

As far as now to recover from this. From what I understand its recommended to delete the existing set of resources and redeploy, I know some guys in the field have done this previously (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1572017).  Its best to restore the config to a known working config but I don't know if there are any backups available.

Comment 20 Bob Fournier 2018-10-17 14:47:23 UTC
Also, it would also be useful to see the output of "openstack port list".
Based on this stack failure:

Update to properties network of VipPort (OS::Neutron::Port) is not supported.
overcloud.StorageMgmtVirtualIP:
  resource_type: OS::TripleO::Network::Ports::StorageMgmtVipPort
  physical_resource_id: 286d8922-2aaa-4db4-8542-a3cfd612d735
  status: UPDATE_FAILED
  status_reason: |
    UPDATE aborted

It looks like the StorageMgmtVipPort was created at some point and that output will confirm it.

In network-environment.yaml there is now no StorageMgmtVipPort being created:  
# Port assignments for the VIPs
  OS::TripleO::Network::Ports::ExternalVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/external.yaml
  OS::TripleO::Network::Ports::InternalApiVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml
  OS::TripleO::Network::Ports::StorageVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml
  OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/network/ports/vip.yaml

which is why the error occurs.

Comment 25 Bob Fournier 2018-10-18 14:05:30 UTC
Daniel - I agree.

Also, since they’re not including network-isolation.yaml but have partially included the defines from network-isolation.yaml in network-environment.yaml its possible they are missing some necessary settings. At a minimum I’d recommend adding the definitions for the StorageMgmtNetwork, even though this network is not being used, i.e:

 OS::TripleO::Network::StorageMgmt: ../network/storage_mgmt.yaml
 OS::TripleO::Network::Ports::StorageMgmtVipPort: ../network/ports/storage_mgmt.yaml
 OS::TripleO::Controller::Ports::StorageMgmtPort: ../network/ports/storage_mgmt.yaml

This should help with the error:
Update to properties network of VipPort (OS::Neutron::Port) is not supported.
overcloud.StorageMgmtVirtualIP:

Following that, its likely there could be other issues with network already in use.  As indicated earlier, its very hard to tell the history here and what ports were defined in network-environment.yaml in a previous deployment since they didn’t include the standard network-isolation.yaml and we don't know the revision history of network-environment.yaml.  There probably are existing neutron DB entries that need to be removed.

Comment 27 Bob Fournier 2018-10-18 16:01:16 UTC
You can try changes in Comment 25 to add StoragMgmtVipPort since network-isolation.yaml was not used, which may help with failure from Comment 19. With no backups it may be necessary to remove existing neutron DB entries following this.

Comment 28 Bob Fournier 2018-10-18 17:36:10 UTC
It looks like the problem was first introduced on 10/10, we can see isolated ports in the deployment being changed to non-isolated (noop.yaml), which isn’t good:

heat-engine.log-20181013.gz:2018-10-10 13:55:15.938 3545 WARNING heat.engine.environment [req-9740b97b-7163-4503-b3a8-9431a3f6678f - - - - -] Changing OS::TripleO::Compute::Ports::InternalApiPort from http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/user-files/usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml to http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/network/ports/noop.yaml
heat-engine.log-20181013.gz:2018-10-10 13:55:15.940 3545 WARNING heat.engine.environment [req-9740b97b-7163-4503-b3a8-9431a3f6678f - - - - -] Changing OS::TripleO::Controller::Ports::StoragePort from http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/user-files/usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml to http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/network/ports/noop.yaml

it looks like that resulted in these errors:
| Networks                                  | 37f41c7d-458d-49e6-91b3-5c0533face2e | resources.Networks: Conflict: resources.StorageNetwork.resources.StorageSubnet: Unable to complete operation on subnet 005f5d92-82b5-4a05-9e2a-ebc8d5adf169: One or more ports have an IP allocation from this subnet. | UPDATE_FAILED      | 2018-10-10T13:56:44Z |
|                                           |                                      | Neutron server returns request_ids: ['re                                                                                                                                                                               |                    |                      |
| overcloud                                 | 6aae59cb-db4d-4781-a1d8-bc21ad0e6201 | resources.Networks: Conflict: resources.StorageNetwork.resources.StorageSubnet: Unable to complete operation on subnet 005f5d92-82b5-4a05-9e2a-ebc8d5adf169: One or more ports have an IP allocation from this subnet. | UPDATE_FAILED      | 2018-10-10T13:56:44Z |


Its not clear what caused the isolated ports to change to noop.yaml, whether it was the"openstack overcloud node delete" or if the templates changed.


On 10/11 we see those ports changed back from noop.yaml to an isolated port:

heat-engine.log-20181013.gz:2018-10-11 14:01:10.635 3545 WARNING heat.engine.environment [req-e442a727-b2c0-48eb-bf70-a90e09dcd8bd - - - - -] Changing OS::TripleO::Compute::Ports::InternalApiPort from http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/network/ports/noop.yaml to http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/user-files/usr/share/openstack-tripleo-heat-templates/network/ports/internal_api.yaml
heat-engine.log-20181013.gz:2018-10-11 14:01:10.636 3545 WARNING heat.engine.environment [req-e442a727-b2c0-48eb-bf70-a90e09dcd8bd - - - - -] Changing OS::TripleO::Controller::Ports::StoragePort from http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/network/ports/noop.yaml to http://100.71.249.40:8080/v1/AUTH_b45f28b402d54025bf6a9d023ce96d58/overcloud/user-files/usr/share/openstack-tripleo-heat-templates/network/ports/storage.yaml

which caused:
| Networks                                  | df2caf81-3999-401f-95be-3a867472495a | resources.Networks: resources.InternalNetwork: Conflict: resources.InternalApiNetwork: Unable to create the flat network. Physical network internal_api is in use.                                                     | UPDATE_FAILED      | 2018-10-11T14:07:14Z |
|                                           |                                      | Neutron server returns request_ids: ['req-4cbe16be-5511-454a-b010-a8304a57bbd1']                                                                                                                                       |                    |                      |
| overcloud                                 | 7316fd54-b2da-439e-81ea-c38e69aad139 | resources.Networks: resources.InternalNetwork: Conflict: resources.InternalApiNetwork: Unable to create the flat network. Physical network internal_api is in use.                                                     | UPDATE_FAILED      | 2018-10-11T14:07:14Z |
|                                           |                                      | Neutron server returns request_ids: ['req-4cbe16be-5511-454a-b010-a8304a57bbd1'] 


BTW - hold off on making the change to StoragMgmtVipPort in network-environment.yaml for now.

Comment 29 Bob Fournier 2018-10-18 20:36:07 UTC
So however it happened, the heat database is in a problematic state for the InternalApi, External, and Storage networks since Oct. 10, which is why deploys since then are failing. We are working on a plan to recover the heat database and will have more shortly.

Comment 31 Bob Fournier 2018-10-19 20:58:08 UTC
Thomas has updated the case:

============================================================

I had  a look at the database dump. It looks indeed like a "classic" environment change. I crafted the following queries, which basically cancels out the updates done. To be run in the undercloud against the heat database:

# Remove old internal subnet
DELETE from resource WHERE id=2013;

# Update new one
UPDATE resource set nova_instance='4296e4d2-8494-439f-9c01-6d4f5b616d80', action='CREATE' WHERE id=3339;

# Remove old external subnet
DELETE from resource WHERE id=2015;

# Update new one
UPDATE resource set nova_instance='d9f95eaf-6271-4022-9ff7-0cc293e8fc2c', action='CREATE' WHERE id=3345;

# Remove old storage subnet
DELETE from resource WHERE id=2017;

# Update new one
UPDATE resource set nova_instance='005f5d92-82b5-4a05-9e2a-ebc8d5adf169', action='CREATE' WHERE id=3342;

# Remove old internal network
DELETE from resource WHERE id=2014;

# Update new one
UPDATE resource set nova_instance='3cfad9eb-3306-44ab-8319-a5cea6cecb00', status='COMPLETE' WHERE id=3380;

# Remove old external network
DELETE from resource WHERE id=2016;

# Update new one
UPDATE resource set nova_instance='d443be1e-7196-48d6-8bce-f16fb0d330d5', status='COMPLETE' WHERE id=3381;

# Remove old storage network
DELETE from resource WHERE id=2018;

# Update new one
UPDATE resource set nova_instance='11c15afb-f2f6-46cd-978c-4ace5f2538cc', status='COMPLETE' WHERE id=3382;


Once done, restart heat-engine, and retry to deploy.

Comment 36 Bob Fournier 2019-01-02 14:17:18 UTC
Regarding RCA, from the case which has been closed:
Trying to retrace the issue. The problem for heat appears here:

2018-10-10 13:55:15.938 3545 WARNING heat.engine.environment: Changing OS::TripleO::Compute::Ports::InternalApiPort from internal_api.yaml to noop.yaml

From mistral engine logs we can see:

2018-10-10 13:53:28.682 1540 INFO workflow_trace [-] Starting workflow [name=tripleo.plan_management.v1.update_deployment_plan..]
2018-10-10 13:54:53.811 1540 INFO workflow_trace [-] Starting workflow [name=tripleo.scale.v1.delete_node..]

node delete doesn't call update_deployment_plan, node deploy does.

In the mistral api logs:

2018-10-10 13:53:39.456 3515 INFO mistral.api.controllers.v2.environment [-] Update environment ... u'environments': [{u'path': u'overcloud-resource-registry-puppet.yaml'}]}

user-environment is missing here.

There is no corresponding deploy_plan workflow calls, so it's like that they called "openstack overcloud deploy --update-plan-only --templates", without the environment files.

========

Closing this bug.


Note You need to log in before you can comment on or make changes to this bug.