Description of problem: > according to ticket number ####, support suggested me to run an update commenting: > # - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml > First I tried to deploy commenting that line and the error ocurred. Now, even with this line uncommented, the problem occurs. So if I understand the update history here, the latest two stack updates that were run was this command with these contents, first, having network-isolation.yaml commented out in answers.yaml? ~~~ openstack overcloud deploy -r /home/stack/templates/roles_data.yaml --answers-file ~/answers.yaml --ntp-server a.ntp.br,b.ntp.br,c.ntp.br,pool.ntp.br answers.yaml: templates: /home/stack/openstack-tripleo-heat-templates/ environments: - /home/stack/templates/node-info.yaml - /home/stack/templates/ports.yaml - /home/stack/templates/compute-hci.yaml - /home/stack/templates/overcloud_images.yaml - /home/stack/templates/environment-file-1.yaml - /home/stack/templates/cinder-dellps-config.yaml - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml - /home/stack/templates/storage-config.yaml - /home/stack/templates/ceph-config.yaml - /home/stack/templates/ceph-config-per_node.yaml # - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml - /home/stack/templates/network-environment.yaml - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml - /home/stack/templates/rhel-registration/environment-rhel-registration.yaml - /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml - /home/stack/templates/enable-tls.yaml - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml - /home/stack/templates/ironic.yaml - /home/stack/templates/fencing.yaml - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml - /home/stack/templates/remove_manila.yaml ~~~ Then the stack failed, right? Then you tried re-running the same deploy command, but uncommented network-isolation in answers.yaml? ~~~ openstack overcloud deploy -r /home/stack/templates/roles_data.yaml --answers-file ~/answers.yaml --ntp-server a.ntp.br,b.ntp.br,c.ntp.br,pool.ntp.br answers.yaml: templates: /home/stack/openstack-tripleo-heat-templates/ environments: - /home/stack/templates/node-info.yaml - /home/stack/templates/ports.yaml - /home/stack/templates/compute-hci.yaml - /home/stack/templates/overcloud_images.yaml - /home/stack/templates/environment-file-1.yaml - /home/stack/templates/cinder-dellps-config.yaml - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml - /home/stack/templates/storage-config.yaml - /home/stack/templates/ceph-config.yaml - /home/stack/templates/ceph-config-per_node.yaml - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml - /home/stack/templates/network-environment.yaml - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml - /home/stack/templates/rhel-registration/environment-rhel-registration.yaml - /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml - /home/stack/templates/enable-tls.yaml - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml - /home/stack/templates/ironic.yaml - /home/stack/templates/fencing.yaml - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml - /home/stack/templates/remove_manila.yaml ~~~ And that's where we are right now with regard to the stack deploy/update commands issued, and stack failure? Then I assume, sometime previous to running these two latest deploy commands, that the most recent stack deploy/update command that was run SUCCESSFULLY, was actually this one below but before trying to comment out network-isolation?: ~~~ openstack overcloud deploy -r /home/stack/templates/roles_data.yaml --answers-file ~/answers.yaml --ntp-server a.ntp.br,b.ntp.br,c.ntp.br,pool.ntp.br answers.yaml: templates: /home/stack/openstack-tripleo-heat-templates/ environments: - /home/stack/templates/node-info.yaml - /home/stack/templates/ports.yaml - /home/stack/templates/compute-hci.yaml - /home/stack/templates/overcloud_images.yaml - /home/stack/templates/environment-file-1.yaml - /home/stack/templates/cinder-dellps-config.yaml - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml - /home/stack/templates/storage-config.yaml - /home/stack/templates/ceph-config.yaml - /home/stack/templates/ceph-config-per_node.yaml - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml - /home/stack/templates/network-environment.yaml - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml - /home/stack/templates/rhel-registration/environment-rhel-registration.yaml - /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml - /home/stack/templates/enable-tls.yaml - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml - /home/stack/templates/ironic.yaml - /home/stack/templates/fencing.yaml - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml - /home/stack/templates/remove_manila.yaml ~~~ Do I understand correctly the deploy command/template history here? I ask these questions to make sure I'm not missing anything here. And also there's quite a few templates to parse through. Your knowledge of the history here might speed things up. Thank you, -Andrew Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: 5e18c1b6-4f88-44c5-983c-9ca4c8df5387 status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR overcloud.Networks.InternalNetwork.InternalApiNetwork: resource_type: OS::Neutron::Net physical_resource_id: status: CREATE_FAILED status_reason: | Conflict: resources.InternalApiNetwork: Unable to create the flat network. Physical network internal_api is in use. Neutron server returns request_ids: ['req-2966d94f-8ef8-47f8-b5c2-825fa312f774'] overcloud.Networks.StorageMgmtNetwork.StorageMgmtNetwork: resource_type: OS::Neutron::Net physical_resource_id: status: CREATE_FAILED status_reason: | Conflict: resources.StorageMgmtNetwork: Unable to create the flat network. Physical network storage_mgmt is in use. Neutron server returns request_ids: ['req-ad21eb7e-acb1-4598-9fdb-5cb2dfb9755e'] overcloud.Networks.StorageNetwork.StorageNetwork: resource_type: OS::Neutron::Net physical_resource_id: status: CREATE_FAILED status_reason: | Conflict: resources.StorageNetwork: Unable to create the flat network. Physical network storage is in use. Neutron server returns request_ids: ['req-4e86fd73-d963-4b8c-bedb-cae28b35b0bb'] overcloud.Networks.TenantNetwork.TenantNetwork: resource_type: OS::Neutron::Net physical_resource_id: status: CREATE_FAILED status_reason: | Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use. Neutron server returns request_ids: ['req-44b521e9-387f-4bbd-a8cf-d05a7ce583fc'] overcloud.Networks.ExternalNetwork.ExternalNetwork: resource_type: OS::Neutron::Net physical_resource_id: status: CREATE_FAILED status_reason: | Conflict: resources.ExternalNetwork: Unable to create the flat network. Physical network external is in use. Neutron server returns request_ids: ['req-89b07cd7-8662-47b2-992b-6a0a3e7c861a'] Heat Stack update failed. Heat Stack update failed. Expected results: Overcloud update. Additional info:
*** Bug 1572685 has been marked as a duplicate of this bug. ***
If you deploy without network-isolation.yaml and then you deploy WITH network-isolation.yaml it is expected that the deployment would fail with errors like "Unable to create the flat network. Physical network internal_api is in use" for example. On the update Heat would create ports for all the nodes on the isolated networks but ports already exist from the previous deployment (with no network-isolation.yaml) on the ctrlplane. So if that is the sequence of deployments this is as expected.
Summary of Problem: Openstack deploy worked fine with templates. Customer added 3 ceph nodes with no issues after initial deployment. Changes were made to that same template to remove manila. diff deploy-error.tar.gz/templates/remove_manila.yaml deploy-rh.tar.gz/templates/remove_manila.yaml 5c5 < OS::TripleO::Services::ManilaBackendGeneric: OS::Heat::None --- > OS::TripleO::Services::ManilaBackendCephFs: OS::Heat::None This was the only change and the redeploy failed. Errors below: 2018-03-12 17:05:57Z [overcloud-Networks-teq7ain3shg3-TenantNetwork-xxxxx.TenantNetwork]: CREATE_FAILED Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use. Neutron server returns request_ids: ['req-44b521e9-387f-4bbd-a8cf-d05a7ce583fc'] 2018-03-12 17:05:57Z [overcloud-Networks-teq7ain3shg3.StorageMgmtNetwork]: UPDATE_IN_PROGRESS state changed 2018-03-12 17:05:57Z [overcloud-Networks-xxxx-ManagementNetwork-zyywtktmpe3v]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-03-12 17:05:58Z [overcloud-Networks-xxxx-ManagementNetwork-zyywtktmpe3v]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-03-12 17:05:58Z [overcloud-Networks-xxxx-TenantNetwork-xxxx]: UPDATE_FAILED Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use. Trying to understand why redeploy failed with such a minor change.
Created attachment 1435156 [details] Initial deployment failure logs
Thanks Stan. As we talked about offline, we'll use this BZ to track the initial deployment failure and https://bugzilla.redhat.com/show_bug.cgi?id=1572017 to track the subsequent failure and the potential database recovery efforts. The DFG:DF should be involved to help with 1572017 and the recovery efforts. I've added the logs to this BZ from the case for this initial deployment failure. In addition, a BZ referenced in the case may be relevant here, although that BZ is from OSP-10 - https://bugzilla.redhat.com/show_bug.cgi?id=1483246. And, no problem... ;-)
From the initial set of logs in the case (sosreport-manager.eveocloud.net.02053305-20180312141530) we see these heat/neutron errors across multiple subnets which are most likely the source of the problem: heat/heat-engine.log:2018-03-12 13:50:24.852 3824 DEBUG neutronclient.v2_0.client [req-a0400374-96df-4a43-b56c-add8f668581e - admin - default default] Error message: {"NeutronError": {"message": "Unable to complete operation on subnet 1aba3e87-2dc6-4ddb-b120-a27a6143151f: One or more ports have an IP allocation from this subnet.", "type": "SubnetInUse", "detail": ""}} _handle_fault_response /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:258 which results in many of these heat stack traces: heat/heat-engine.log:2018-03-12 13:51:32.064 3827 ERROR heat.engine.resource Conflict: Unable to complete operation on subnet 23c47e0b-0f97-42b5-b184-0e87f80aa7a3: One or more ports have an IP allocation from this subnet. this is a fairly common error message that can be due to multiple reasons, usually due to a change in the nic config templates between deployments that causes new ports to be created, i.e.: https://bugzilla.redhat.com/show_bug.cgi?id=1353735 https://bugzilla.redhat.com/show_bug.cgi?id=1429081 At this point its difficult to go back to see what exact configuration change caused the problem without the port list after the failed deployment, but it looks like significant changes between the two deployments based on the answer files that were used (answers-complete-19-01-09.yaml was the initial deployment and answers.yaml was the one that failed): bfournie-OSX:bug1572686 bfournie$ diff answers-complete-19-01-09.yaml answers.yaml 9,11c9,11 < - /home/stack/openstack-tripleo-heat-templates/environments/cinder-backup.yaml < - /home/stack/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml < - /home/stack/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml --- > - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml > - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml > - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml 13a14 > - /home/stack/templates/ceph-config-per_node.yaml 16c17 < - /home/stack/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml --- > - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml 20c21 < - /home/stack/openstack-tripleo-heat-templates/environments/services/ironic.yaml --- > - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml 21a23,26 > - /home/stack/templates/fencing.yaml > - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml > - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml > - /home/stack/templates/remove_manila.yaml I assume this has been tried but did you use the initial answers file and do a deployment? In conjunction with the output of "openstack port list" it might help point to which ports got created erroneously and can be removed.
Bob, So what could be tried is to use undercloud db before the issue and try a deployment with corrections to get to a clean undercloud db? I do not think we tried this method. The other way would be to take the damaged db and delete the networks and redeploy. Let me get the "openstack port list" and give it a try. Thanks, Stan
To further clarify what template files changed between the initial and the first failed deployment, i.e. answers-complete-19-01-09.yaml vs answers.yaml, I compared the differences between the local files and OSP-12 installed files and all are the same. The additional include of these files in the failed deployment are the only differences: /home/stack/templates/ceph-config-per_node.yaml /home/stack/templates/fencing.yaml /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml /home/stack/templates/remove_manila.yaml Of these the change in remove_manila.yaml would be most intrusive as it sets all the manila services to None: resource_registry: OS::TripleO::Services::ManilaApi: OS::Heat::None OS::TripleO::Services::ManilaScheduler: OS::Heat::None OS::TripleO::Services::ManilaShare: OS::Heat::None OS::TripleO::Services::ManilaBackendGeneric: OS::Heat::None Which causes these heat warnings: heat-engine.log:2018-03-12 13:49:46.786 3828 WARNING heat.engine.environment [req-b7d64eb5-f61a-4d13-9df9-87a6ed1d8b2f - - - - -] Changing OS::TripleO::Services::ManilaScheduler from https://manager.eveocloud.net:13808/v1/AUTH_4c2b30aaf2514fdda1379512277503a7/overcloud/puppet/services/manila-scheduler.yaml to OS::Heat::None heat-engine.log:2018-03-12 13:49:46.788 3828 WARNING heat.engine.environment [req-b7d64eb5-f61a-4d13-9df9-87a6ed1d8b2f - - - - -] Changing OS::TripleO::Services::ManilaBackendCephFs from https://manager.eveocloud.net:13808/v1/AUTH_4c2b30aaf2514fdda1379512277503a7/overcloud/puppet/services/manila-backend-cephfs.yaml to OS::Heat::None heat-engine.log:2018-03-12 13:49:46.789 3828 WARNING heat.engine.environment [req-b7d64eb5-f61a-4d13-9df9-87a6ed1d8b2f - - - - -] Changing OS::TripleO::Services::ManilaApi from https://manager.eveocloud.net:13808/v1/AUTH_4c2b30aaf2514fdda1379512277503a7/overcloud/puppet/services/manila-api.yaml to OS::Heat::None Although I don't think that would cause "One or more ports have an IP allocation from this subnet" issue. One thing that's not possible to tell is the network-environment.yaml changed between deployments, i.e. ip assignments, vlan/vxlan changes etc.
Recreated failure with network conflict. Only added network-isolation.yaml and remove_manila.yaml It should have worked because initial creation was using same file with my addtions. test-deploy.sh #! /bin/bash timeout 100m openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -r /home/stack/templates/roles_data.yaml \ -e /home/stack/templates/config_lvm.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/templates/network/network-environment.yaml \ -e /home/stack/templates/service_net_map.yaml \ -e /home/stack/templates/hostnames.yml \ -e /home/stack/templates/debug.yaml \ -e /home/stack/templates/nodes_data.yaml \ -e /home/stack/templates/docker-images.yaml \ -e /home/stack/templates/remove_manila.yaml \ --log-file overcloud_deployment_02.log (undercloud) [stack@undercloud-0 templates]$ ./test-deploy.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 523b5384-6795-4973-952c-5ce20f04affa Waiting for messages on queue '789ace2f-33ca-4ba2-be13-e6b3d240e59b' with no timeout. Removing the current plan files Uploading new plan files Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 7c4ec64f-10aa-43f9-a9de-f60ee086040c Plan updated. Processing templates in the directory /tmp/tripleoclient-2zNi8r/tripleo-heat-templates Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: e45caaea-7ebc-4942-91bb-2d2832d15d50 WARNING: Following parameters are deprecated and still defined. Deprecated parameters will be removed soon! OvercloudControlFlavor Deploying templates in the directory /tmp/tripleoclient-2zNi8r/tripleo-heat-templates Started Mistral Workflow tripleo.deployment.v1.deploy_plan. Execution ID: 069657b2-d788-4329-b994-df78aecbcd60 2018-05-19 18:41:34Z [Networks]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:35Z [overcloud-Networks-2xixxy6jehcg]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:35Z [DefaultPasswords]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:35Z [overcloud-Networks-2xixxy6jehcg.InternalNetwork]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:36Z [ServiceNetMap]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:37Z [overcloud-Networks-2xixxy6jehcg.TenantNetwork]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:37Z [overcloud-Networks-2xixxy6jehcg-InternalNetwork-dhjytfwuwd3d]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:37Z [overcloud-ServiceNetMap-uxrbrby6jz5z]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:37Z [overcloud-ServiceNetMap-uxrbrby6jz5z]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg-InternalNetwork-dhjytfwuwd3d]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-05-19 18:41:38Z [DefaultPasswords]: UPDATE_COMPLETE state changed 2018-05-19 18:41:38Z [ServiceNetMap]: UPDATE_COMPLETE state changed 2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg-TenantNetwork-hpy5rft765og]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg.StorageMgmtNetwork]: CREATE_IN_PROGRESS state changed 2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg.ManagementNetwork]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:39Z [overcloud-Networks-2xixxy6jehcg-TenantNetwork-hpy5rft765og]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-05-19 18:41:40Z [overcloud-Networks-2xixxy6jehcg.StorageNetwork]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:40Z [overcloud-Networks-2xixxy6jehcg-ManagementNetwork-hmepq7r5b54d]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:40Z [overcloud-Networks-2xixxy6jehcg-ManagementNetwork-hmepq7r5b54d]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-05-19 18:41:41Z [overcloud-Networks-2xixxy6jehcg-StorageNetwork-qhaqwfqbmyyn]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:41Z [overcloud-Networks-2xixxy6jehcg.ExternalNetwork]: UPDATE_IN_PROGRESS state changed 2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg-StorageNetwork-qhaqwfqbmyyn]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg-ExternalNetwork-xj6kslsodp7y]: UPDATE_IN_PROGRESS Stack UPDATE started 2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg.ManagementNetwork]: UPDATE_COMPLETE state changed 2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg.TenantNetwork]: UPDATE_COMPLETE state changed 2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg.StorageMgmtNetwork]: CREATE_COMPLETE state changed 2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg.InternalNetwork]: UPDATE_COMPLETE state changed 2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg.StorageNetwork]: UPDATE_COMPLETE state changed 2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe]: DELETE_IN_PROGRESS Stack DELETE started 2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe.StorageMgmtSubnet]: DELETE_IN_PROGRESS state changed 2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg-ExternalNetwork-xj6kslsodp7y]: UPDATE_COMPLETE Stack UPDATE completed successfully 2018-05-19 18:41:44Z [overcloud-Networks-2xixxy6jehcg.ExternalNetwork]: UPDATE_COMPLETE state changed 2018-05-19 18:42:54Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe.StorageMgmtSubnet]: DELETE_FAILED Conflict: resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet. Neutron server returns request_ids: ['req-f9d42fb9-cca3-4806-8387-28b2c15342ca'] 2018-05-19 18:42:54Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe]: DELETE_FAILED Resource DELETE failed: Conflict: resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet. Neutron server returns request_ids: ['req-f9d42fb9-cca3-4 2018-05-19 18:42:55Z [overcloud-Networks-2xixxy6jehcg]: UPDATE_FAILED Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet. Neutron server returns request_ids: ['req-f9d42fb9-c 2018-05-19 18:42:56Z [Networks]: UPDATE_FAILED resources.Networks: Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet. Neutron server returns request_i 2018-05-19 18:42:57Z [overcloud]: UPDATE_FAILED resources.Networks: Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet. Neutron server returns request_i Stack overcloud UPDATE_FAILED overcloud.Networks: resource_type: OS::TripleO::Network physical_resource_id: c398f599-10a6-42ad-943a-82d771aa2732 status: UPDATE_FAILED status_reason: | resources.Networks: Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet. Neutron server returns request_ids: ['req-f9d42fb9-cca3-4806-8387-28b2c15342ca'] overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: b58dc5e8-eccd-4029-8891-04aebd970af8 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... "2018-05-19 17:19:50,513 ERROR: 929733 -- ERROR configuring crond", "2018-05-19 17:19:50,513 ERROR: 929733 -- ERROR configuring nova_libvirt" ], "failed_when_result": true } to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/59f23edb-0094-4874-a9a8-944b8ece9e78_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=6 changed=2 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | Heat Stack update failed. Heat Stack update failed.
Just to clarify, this bz is tracking the first failure in the deployment following the initial deployment. The differences between the first failure and the initial are the files identified in https://bugzilla.redhat.com/show_bug.cgi?id=1572686#c12 and further Comment 14. In both of these deployments network-isolation.yaml was being used. I think we'd expect netwok config failures, for example, the "one or ports" allocation error when doing a new deployment and network-isolation.yaml is either added (if it wasn't in initial) or removed (if it was in initial). I think the adding or removing will help to figure out how to recover the DB for the other issue https://bugzilla.redhat.com/show_bug.cgi?id=1572017 - but there appears to be another config related issue that caused the deployment to get stuck in that state in the first place. Its still not clear why those file changes in comment 14 caused the problem, although we don't have enough info to know if other changes were made in the network-environment.yaml file, such as IP range changes etc.
I am wondering if turning on an the tech preview of Load Balancing as a Service might have caused the issue. I will look at the network-environment.yaml before and after the issue. Thanks
> I am wondering if turning on an the tech preview of Load Balancing as a Service might > have caused the issue. Yeah, good point, the addition of /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml could be an issue.
After discussion with Stan, closing this out. The initial config had network_isolation.yaml and then it was removed, causing the duplicate ports.