Bug 1572686 - Update of Overcloud failed during redeploy OSP 12
Summary: Update of Overcloud failed during redeploy OSP 12
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: instack-undercloud
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: James Slagle
QA Contact: Arik Chernetsky
URL:
Whiteboard:
: 1572685 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-27 15:03 UTC by Stan Toporek
Modified: 2021-12-10 16:22 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-24 00:26:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Initial deployment failure logs (8.48 MB, application/x-tar)
2018-05-11 22:25 UTC, Bob Fournier
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1572017 0 high CLOSED Undercloud neutron and heat db corrupted after commenting out network-isolation.yaml from answers.yaml and doing overclo... 2022-08-04 14:56:31 UTC
Red Hat Issue Tracker OSP-11408 0 None None None 2021-12-10 16:22:19 UTC

Description Stan Toporek 2018-04-27 15:03:49 UTC
Description of problem:

> according to ticket number ####, support suggested me to run an update commenting:
> #  - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml
> First I tried to deploy commenting that line and the error ocurred. Now, even with this line uncommented, the problem occurs.

So if I understand the update history here, the latest two stack updates that were run was this command with these contents, first, having network-isolation.yaml commented out in answers.yaml?

~~~
openstack overcloud deploy -r /home/stack/templates/roles_data.yaml --answers-file ~/answers.yaml --ntp-server a.ntp.br,b.ntp.br,c.ntp.br,pool.ntp.br

answers.yaml:

templates: /home/stack/openstack-tripleo-heat-templates/
environments:
  - /home/stack/templates/node-info.yaml
  - /home/stack/templates/ports.yaml
  - /home/stack/templates/compute-hci.yaml
  - /home/stack/templates/overcloud_images.yaml
  - /home/stack/templates/environment-file-1.yaml
  - /home/stack/templates/cinder-dellps-config.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml
  - /home/stack/templates/storage-config.yaml
  - /home/stack/templates/ceph-config.yaml
  - /home/stack/templates/ceph-config-per_node.yaml
#  - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml
  - /home/stack/templates/network-environment.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml
  - /home/stack/templates/rhel-registration/environment-rhel-registration.yaml
  - /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml
  - /home/stack/templates/enable-tls.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml
  - /home/stack/templates/ironic.yaml
  - /home/stack/templates/fencing.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml
  - /home/stack/templates/remove_manila.yaml
~~~


Then the stack failed, right? Then you tried re-running the same deploy command, but uncommented network-isolation in answers.yaml?


~~~
openstack overcloud deploy -r /home/stack/templates/roles_data.yaml --answers-file ~/answers.yaml --ntp-server a.ntp.br,b.ntp.br,c.ntp.br,pool.ntp.br

answers.yaml:

templates: /home/stack/openstack-tripleo-heat-templates/
environments:
  - /home/stack/templates/node-info.yaml
  - /home/stack/templates/ports.yaml
  - /home/stack/templates/compute-hci.yaml
  - /home/stack/templates/overcloud_images.yaml
  - /home/stack/templates/environment-file-1.yaml
  - /home/stack/templates/cinder-dellps-config.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml
  - /home/stack/templates/storage-config.yaml
  - /home/stack/templates/ceph-config.yaml
  - /home/stack/templates/ceph-config-per_node.yaml
  - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml
  - /home/stack/templates/network-environment.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml
  - /home/stack/templates/rhel-registration/environment-rhel-registration.yaml
  - /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml
  - /home/stack/templates/enable-tls.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml
  - /home/stack/templates/ironic.yaml
  - /home/stack/templates/fencing.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml
  - /home/stack/templates/remove_manila.yaml
~~~


And that's where we are right now with regard to the stack deploy/update commands issued, and stack failure?


Then I assume, sometime previous to running these two latest deploy commands, that the most recent stack deploy/update command that was run SUCCESSFULLY, was actually this one below but before trying to comment out network-isolation?:


~~~
openstack overcloud deploy -r /home/stack/templates/roles_data.yaml --answers-file ~/answers.yaml --ntp-server a.ntp.br,b.ntp.br,c.ntp.br,pool.ntp.br

answers.yaml:

templates: /home/stack/openstack-tripleo-heat-templates/
environments:
  - /home/stack/templates/node-info.yaml
  - /home/stack/templates/ports.yaml
  - /home/stack/templates/compute-hci.yaml
  - /home/stack/templates/overcloud_images.yaml
  - /home/stack/templates/environment-file-1.yaml
  - /home/stack/templates/cinder-dellps-config.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml
  - /home/stack/templates/storage-config.yaml
  - /home/stack/templates/ceph-config.yaml
  - /home/stack/templates/ceph-config-per_node.yaml
  - /home/stack/openstack-tripleo-heat-templates/environments/network-isolation.yaml
  - /home/stack/templates/network-environment.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml
  - /home/stack/templates/rhel-registration/environment-rhel-registration.yaml
  - /home/stack/templates/rhel-registration/rhel-registration-resource-registry.yaml
  - /home/stack/templates/enable-tls.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml
  - /home/stack/templates/ironic.yaml
  - /home/stack/templates/fencing.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
  - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml
  - /home/stack/templates/remove_manila.yaml
~~~


Do I understand correctly the deploy command/template history here?   I ask these questions to make sure I'm not missing anything here. And also there's quite a few templates to parse through. Your knowledge of the history here might speed things up.


Thank you,
-Andrew

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Stack overcloud UPDATE_FAILED 

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::Mistral::ExternalResource
  physical_resource_id: 5e18c1b6-4f88-44c5-983c-9ca4c8df5387
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
overcloud.Networks.InternalNetwork.InternalApiNetwork:
  resource_type: OS::Neutron::Net
  physical_resource_id: 
  status: CREATE_FAILED
  status_reason: |
    Conflict: resources.InternalApiNetwork: Unable to create the flat network. Physical network internal_api is in use.
    Neutron server returns request_ids: ['req-2966d94f-8ef8-47f8-b5c2-825fa312f774']
overcloud.Networks.StorageMgmtNetwork.StorageMgmtNetwork:
  resource_type: OS::Neutron::Net
  physical_resource_id: 
  status: CREATE_FAILED
  status_reason: |
    Conflict: resources.StorageMgmtNetwork: Unable to create the flat network. Physical network storage_mgmt is in use.
    Neutron server returns request_ids: ['req-ad21eb7e-acb1-4598-9fdb-5cb2dfb9755e']
overcloud.Networks.StorageNetwork.StorageNetwork:
  resource_type: OS::Neutron::Net
  physical_resource_id: 
  status: CREATE_FAILED
  status_reason: |
    Conflict: resources.StorageNetwork: Unable to create the flat network. Physical network storage is in use.
    Neutron server returns request_ids: ['req-4e86fd73-d963-4b8c-bedb-cae28b35b0bb']
overcloud.Networks.TenantNetwork.TenantNetwork:
  resource_type: OS::Neutron::Net
  physical_resource_id: 
  status: CREATE_FAILED
  status_reason: |
    Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use.
    Neutron server returns request_ids: ['req-44b521e9-387f-4bbd-a8cf-d05a7ce583fc']
overcloud.Networks.ExternalNetwork.ExternalNetwork:
  resource_type: OS::Neutron::Net
  physical_resource_id: 
  status: CREATE_FAILED
  status_reason: |
    Conflict: resources.ExternalNetwork: Unable to create the flat network. Physical network external is in use.
    Neutron server returns request_ids: ['req-89b07cd7-8662-47b2-992b-6a0a3e7c861a']
Heat Stack update failed.
Heat Stack update failed.

Expected results:

Overcloud update.
Additional info:

Comment 2 Alex Schultz 2018-04-30 16:35:27 UTC
*** Bug 1572685 has been marked as a duplicate of this bug. ***

Comment 5 Bob Fournier 2018-05-10 21:56:20 UTC
If you deploy without network-isolation.yaml and then you deploy WITH network-isolation.yaml it is expected that the deployment would fail with errors like "Unable to create the flat network. Physical network internal_api is in use" for example.
On the update Heat would create ports for all the nodes on the isolated networks but ports already exist from the previous deployment (with no network-isolation.yaml) on the ctrlplane.

So if that is the sequence of deployments this is as expected.

Comment 9 Stan Toporek 2018-05-11 20:11:29 UTC
Summary of Problem:

Openstack deploy worked fine with templates. Customer added 3 ceph nodes with no issues after initial deployment. Changes were made to that same template to remove manila.

diff deploy-error.tar.gz/templates/remove_manila.yaml deploy-rh.tar.gz/templates/remove_manila.yaml
5c5
<   OS::TripleO::Services::ManilaBackendGeneric: OS::Heat::None
---
>   OS::TripleO::Services::ManilaBackendCephFs: OS::Heat::None

This was the only change and the redeploy failed. Errors below:

2018-03-12 17:05:57Z [overcloud-Networks-teq7ain3shg3-TenantNetwork-xxxxx.TenantNetwork]: CREATE_FAILED  Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use.
Neutron server returns request_ids: ['req-44b521e9-387f-4bbd-a8cf-d05a7ce583fc']
2018-03-12 17:05:57Z [overcloud-Networks-teq7ain3shg3.StorageMgmtNetwork]: UPDATE_IN_PROGRESS  state changed
2018-03-12 17:05:57Z [overcloud-Networks-xxxx-ManagementNetwork-zyywtktmpe3v]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-03-12 17:05:58Z [overcloud-Networks-xxxx-ManagementNetwork-zyywtktmpe3v]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-03-12 17:05:58Z [overcloud-Networks-xxxx-TenantNetwork-xxxx]: UPDATE_FAILED  Conflict: resources.TenantNetwork: Unable to create the flat network. Physical network tenant is in use.

Trying to understand why redeploy failed with such a minor change.

Comment 10 Bob Fournier 2018-05-11 22:25:15 UTC
Created attachment 1435156 [details]
Initial deployment failure logs

Comment 11 Bob Fournier 2018-05-11 22:30:57 UTC
Thanks Stan.  As we talked about offline, we'll use this BZ to track the initial deployment failure and https://bugzilla.redhat.com/show_bug.cgi?id=1572017 to track the subsequent failure and the potential database recovery efforts.  The DFG:DF should be involved to help with 1572017 and the recovery efforts.

I've added the logs to this BZ from the case for this initial deployment failure.

In addition, a BZ referenced in the case may be relevant here, although that BZ is from OSP-10 - https://bugzilla.redhat.com/show_bug.cgi?id=1483246.

And, no problem... ;-)

Comment 12 Bob Fournier 2018-05-14 16:02:23 UTC
From the initial set of logs in the case (sosreport-manager.eveocloud.net.02053305-20180312141530) we see these heat/neutron errors across multiple subnets which are most likely the source of the problem:

heat/heat-engine.log:2018-03-12 13:50:24.852 3824 DEBUG neutronclient.v2_0.client [req-a0400374-96df-4a43-b56c-add8f668581e - admin - default default] Error message: {"NeutronError": {"message": "Unable to complete operation on subnet 1aba3e87-2dc6-4ddb-b120-a27a6143151f: One or more ports have an IP allocation from this subnet.", "type": "SubnetInUse", "detail": ""}} _handle_fault_response /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:258

which results in many of these heat stack traces:
heat/heat-engine.log:2018-03-12 13:51:32.064 3827 ERROR heat.engine.resource Conflict: Unable to complete operation on subnet 23c47e0b-0f97-42b5-b184-0e87f80aa7a3: One or more ports have an IP allocation from this subnet.

this is a fairly common error message that can be due to multiple reasons, usually due to a change in the nic config templates between deployments that causes new ports to be created, i.e.:
https://bugzilla.redhat.com/show_bug.cgi?id=1353735
https://bugzilla.redhat.com/show_bug.cgi?id=1429081

At this point its difficult to go back to see what exact configuration change caused the problem without the port list after the failed deployment, but it looks like significant changes between the two deployments based on the answer files that were used (answers-complete-19-01-09.yaml was the initial deployment and answers.yaml was the one that failed):
bfournie-OSX:bug1572686 bfournie$ diff answers-complete-19-01-09.yaml answers.yaml 
9,11c9,11
<   - /home/stack/openstack-tripleo-heat-templates/environments/cinder-backup.yaml
<   - /home/stack/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
<   - /home/stack/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml
---
>   - /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml
>   - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
>   - /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml
13a14
>   - /home/stack/templates/ceph-config-per_node.yaml
16c17
<   - /home/stack/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml
---
>   - /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml
20c21
<   - /home/stack/openstack-tripleo-heat-templates/environments/services/ironic.yaml
---
>   - /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml
21a23,26
>   - /home/stack/templates/fencing.yaml
>   - /usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
>   - /usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml
>   - /home/stack/templates/remove_manila.yaml

I assume this has been tried but did you use the initial answers file and do a deployment?  In conjunction with the output of "openstack port list" it might help  point to which ports got created erroneously and can be removed.

Comment 13 Stan Toporek 2018-05-14 21:04:45 UTC
Bob,

So what could be tried is to use undercloud db before the issue and try a deployment with corrections to get to a clean undercloud db? I do not think we tried this method. The other way would be to take the damaged db and delete the networks and redeploy. Let me get the "openstack port list" and give it a try. 

Thanks,

Stan

Comment 14 Bob Fournier 2018-05-14 21:49:32 UTC
To further clarify what template files changed between the initial and the first failed deployment, i.e. answers-complete-19-01-09.yaml vs answers.yaml, I compared the differences between the local files and OSP-12 installed files and all are the same.  The additional include of these files in the failed deployment are the only differences:

/home/stack/templates/ceph-config-per_node.yaml
/home/stack/templates/fencing.yaml
/usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
/usr/share/openstack-tripleo-heat-templates/environments/auditd.yaml
/home/stack/templates/remove_manila.yaml

Of these the change in remove_manila.yaml would be most intrusive as it sets all the manila services to None:
resource_registry:
  OS::TripleO::Services::ManilaApi: OS::Heat::None
  OS::TripleO::Services::ManilaScheduler: OS::Heat::None
  OS::TripleO::Services::ManilaShare: OS::Heat::None
  OS::TripleO::Services::ManilaBackendGeneric: OS::Heat::None

Which causes these heat warnings:
heat-engine.log:2018-03-12 13:49:46.786 3828 WARNING heat.engine.environment [req-b7d64eb5-f61a-4d13-9df9-87a6ed1d8b2f - - - - -] Changing OS::TripleO::Services::ManilaScheduler from https://manager.eveocloud.net:13808/v1/AUTH_4c2b30aaf2514fdda1379512277503a7/overcloud/puppet/services/manila-scheduler.yaml to OS::Heat::None
heat-engine.log:2018-03-12 13:49:46.788 3828 WARNING heat.engine.environment [req-b7d64eb5-f61a-4d13-9df9-87a6ed1d8b2f - - - - -] Changing OS::TripleO::Services::ManilaBackendCephFs from https://manager.eveocloud.net:13808/v1/AUTH_4c2b30aaf2514fdda1379512277503a7/overcloud/puppet/services/manila-backend-cephfs.yaml to OS::Heat::None
heat-engine.log:2018-03-12 13:49:46.789 3828 WARNING heat.engine.environment [req-b7d64eb5-f61a-4d13-9df9-87a6ed1d8b2f - - - - -] Changing OS::TripleO::Services::ManilaApi from https://manager.eveocloud.net:13808/v1/AUTH_4c2b30aaf2514fdda1379512277503a7/overcloud/puppet/services/manila-api.yaml to OS::Heat::None

Although I don't think that would cause "One or more ports have an IP allocation from this subnet" issue.  

One thing that's not possible to tell is the network-environment.yaml changed between deployments, i.e. ip assignments, vlan/vxlan changes etc.

Comment 15 Stan Toporek 2018-05-19 18:50:45 UTC
Recreated failure with network conflict.

Only added network-isolation.yaml and remove_manila.yaml

It should have worked because initial creation was using same file with my addtions.

test-deploy.sh
#! /bin/bash

timeout 100m openstack overcloud deploy \
  --templates /usr/share/openstack-tripleo-heat-templates \
  --libvirt-type kvm \
  --ntp-server clock.redhat.com \
  -r /home/stack/templates/roles_data.yaml \
  -e /home/stack/templates/config_lvm.yaml \
  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
  -e /home/stack/templates/network/network-environment.yaml \
  -e /home/stack/templates/service_net_map.yaml \
  -e /home/stack/templates/hostnames.yml \
  -e /home/stack/templates/debug.yaml \
  -e /home/stack/templates/nodes_data.yaml \
  -e /home/stack/templates/docker-images.yaml \
  -e /home/stack/templates/remove_manila.yaml \
  --log-file overcloud_deployment_02.log


(undercloud) [stack@undercloud-0 templates]$ ./test-deploy.sh 
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 523b5384-6795-4973-952c-5ce20f04affa
Waiting for messages on queue '789ace2f-33ca-4ba2-be13-e6b3d240e59b' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 7c4ec64f-10aa-43f9-a9de-f60ee086040c
Plan updated.
Processing templates in the directory /tmp/tripleoclient-2zNi8r/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: e45caaea-7ebc-4942-91bb-2d2832d15d50
WARNING: Following parameters are deprecated and still defined. Deprecated parameters will be removed soon!
  OvercloudControlFlavor
Deploying templates in the directory /tmp/tripleoclient-2zNi8r/tripleo-heat-templates
Started Mistral Workflow tripleo.deployment.v1.deploy_plan. Execution ID: 069657b2-d788-4329-b994-df78aecbcd60
2018-05-19 18:41:34Z [Networks]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:35Z [overcloud-Networks-2xixxy6jehcg]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:35Z [DefaultPasswords]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:35Z [overcloud-Networks-2xixxy6jehcg.InternalNetwork]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:36Z [ServiceNetMap]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:37Z [overcloud-Networks-2xixxy6jehcg.TenantNetwork]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:37Z [overcloud-Networks-2xixxy6jehcg-InternalNetwork-dhjytfwuwd3d]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:37Z [overcloud-ServiceNetMap-uxrbrby6jz5z]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:37Z [overcloud-ServiceNetMap-uxrbrby6jz5z]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg-InternalNetwork-dhjytfwuwd3d]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-05-19 18:41:38Z [DefaultPasswords]: UPDATE_COMPLETE  state changed
2018-05-19 18:41:38Z [ServiceNetMap]: UPDATE_COMPLETE  state changed
2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg-TenantNetwork-hpy5rft765og]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg.StorageMgmtNetwork]: CREATE_IN_PROGRESS  state changed
2018-05-19 18:41:38Z [overcloud-Networks-2xixxy6jehcg.ManagementNetwork]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:39Z [overcloud-Networks-2xixxy6jehcg-TenantNetwork-hpy5rft765og]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-05-19 18:41:40Z [overcloud-Networks-2xixxy6jehcg.StorageNetwork]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:40Z [overcloud-Networks-2xixxy6jehcg-ManagementNetwork-hmepq7r5b54d]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:40Z [overcloud-Networks-2xixxy6jehcg-ManagementNetwork-hmepq7r5b54d]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-05-19 18:41:41Z [overcloud-Networks-2xixxy6jehcg-StorageNetwork-qhaqwfqbmyyn]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:41Z [overcloud-Networks-2xixxy6jehcg.ExternalNetwork]: UPDATE_IN_PROGRESS  state changed
2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg-StorageNetwork-qhaqwfqbmyyn]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg-ExternalNetwork-xj6kslsodp7y]: UPDATE_IN_PROGRESS  Stack UPDATE started
2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg.ManagementNetwork]: UPDATE_COMPLETE  state changed
2018-05-19 18:41:42Z [overcloud-Networks-2xixxy6jehcg.TenantNetwork]: UPDATE_COMPLETE  state changed
2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg.StorageMgmtNetwork]: CREATE_COMPLETE  state changed
2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg.InternalNetwork]: UPDATE_COMPLETE  state changed
2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg.StorageNetwork]: UPDATE_COMPLETE  state changed
2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe]: DELETE_IN_PROGRESS  Stack DELETE started
2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe.StorageMgmtSubnet]: DELETE_IN_PROGRESS  state changed
2018-05-19 18:41:43Z [overcloud-Networks-2xixxy6jehcg-ExternalNetwork-xj6kslsodp7y]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2018-05-19 18:41:44Z [overcloud-Networks-2xixxy6jehcg.ExternalNetwork]: UPDATE_COMPLETE  state changed
2018-05-19 18:42:54Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe.StorageMgmtSubnet]: DELETE_FAILED  Conflict: resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet.
Neutron server returns request_ids: ['req-f9d42fb9-cca3-4806-8387-28b2c15342ca']
2018-05-19 18:42:54Z [overcloud-Networks-2xixxy6jehcg-StorageMgmtNetwork-jatrtw7o7ooe]: DELETE_FAILED  Resource DELETE failed: Conflict: resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet.
Neutron server returns request_ids: ['req-f9d42fb9-cca3-4
2018-05-19 18:42:55Z [overcloud-Networks-2xixxy6jehcg]: UPDATE_FAILED  Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet.
Neutron server returns request_ids: ['req-f9d42fb9-c
2018-05-19 18:42:56Z [Networks]: UPDATE_FAILED  resources.Networks: Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet.
Neutron server returns request_i
2018-05-19 18:42:57Z [overcloud]: UPDATE_FAILED  resources.Networks: Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet.
Neutron server returns request_i

 Stack overcloud UPDATE_FAILED 

overcloud.Networks:
  resource_type: OS::TripleO::Network
  physical_resource_id: c398f599-10a6-42ad-943a-82d771aa2732
  status: UPDATE_FAILED
  status_reason: |
    resources.Networks: Conflict: resources.StorageMgmtNetwork.resources.StorageMgmtSubnet: Unable to complete operation on subnet cbc2aa4d-0b4b-4ed1-82e1-66d5990be246: One or more ports have an IP allocation from this subnet.
    Neutron server returns request_ids: ['req-f9d42fb9-cca3-4806-8387-28b2c15342ca']
overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: b58dc5e8-eccd-4029-8891-04aebd970af8
  status: UPDATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "2018-05-19 17:19:50,513 ERROR: 929733 -- ERROR configuring crond", 
            "2018-05-19 17:19:50,513 ERROR: 929733 -- ERROR configuring nova_libvirt"
        ], 
        "failed_when_result": true
    }
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/59f23edb-0094-4874-a9a8-944b8ece9e78_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=6    changed=2    unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |

Heat Stack update failed.
Heat Stack update failed.

Comment 17 Bob Fournier 2018-05-21 13:44:39 UTC
Just to clarify, this bz is tracking the first failure in the deployment following the initial deployment.  The differences between the first failure and the initial are the files identified in https://bugzilla.redhat.com/show_bug.cgi?id=1572686#c12 and further Comment 14.  In both of these deployments network-isolation.yaml was being used. 

I think we'd expect netwok config failures, for example, the "one or ports" allocation error when doing a new deployment and network-isolation.yaml is either added (if it wasn't in initial) or removed (if it was in initial).

I think the adding or removing will help to figure out how to recover the DB for the other issue https://bugzilla.redhat.com/show_bug.cgi?id=1572017 - but there appears to be another config related issue that caused the deployment to get stuck in that state in the first place.  Its still not clear why those file changes in comment 14 caused the problem, although we don't have enough info to know if other changes were made in the network-environment.yaml file, such as IP range changes etc.

Comment 18 Stan Toporek 2018-05-21 18:43:34 UTC
I am wondering if turning on an the tech preview of Load Balancing as a Service might have caused the issue. I will look at the network-environment.yaml before and after the issue.

Thanks

Comment 19 Bob Fournier 2018-05-21 20:37:36 UTC
> I am wondering if turning on an the tech preview of Load Balancing as a Service might > have caused the issue.

Yeah, good point, the addition of 
/usr/share/openstack-tripleo-heat-templates/environments/services/octavia.yaml
could be an issue.

Comment 20 Bob Fournier 2018-05-24 00:26:19 UTC
After discussion with Stan, closing this out.  The initial config had network_isolation.yaml and then it was removed, causing the duplicate ports.


Note You need to log in before you can comment on or make changes to this bug.