Bug 1651805

Summary: [RFE] Add protection against users running overcloud deploy without all original environment files
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: rhosp-directorAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: Gurenko Alex <agurenko>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: dbecker, mburns, morazi, pweeks
Target Milestone: ---Keywords: FutureFeature, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-02 20:40:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy 2018-11-20 21:56:19 UTC
Description of problem:

  Currently users can easily destroy their overcloud by running a deploy without original environment files. Specifically network environment files. If a deploy has isolated networks and a subsequent deploy,scale,update is ran without all the original network environment files then the overcloud can be destroyed. This happens because without the network environment files that specify isolated networks, the deployment will put everything on controlplane and attempt to delete isolated networks.
  This problem can also occur if users are using a script to deploy and add a new line to the script without the \ . This will cause the script not to use any environment files after the mistake. Given all these scenerios are user mistakes , we need to make sure it's not so easy to destroy an environment. 



Version-Release number of selected component (if applicable):
All osp versions

How reproducible:
unknown

Steps to Reproduce:
1.deploy overcloud with isolated networks
2.re-run deploy command without network environmetn files
3.

Actual results:
overcloud destroyed

Expected results:
Warning message or something to alert users they need to include original deployment network environment files
Additional info:

Currently the only safegaurd against this is the fact that once the deployment attempts to delete the overcloud networks it can not since there are still neutron ports on the network. So deploy fails with unable to delete netork errors such as:
[overcloud-Networks-impwtkh5lxyq-ExternalNetwork-3373wgx7a2nm]: DELETE_FAILED  Resource DELETE failed: Conflict: resources.ExternalSubnet: Unable to complete operation on subnet f2972d18-95a8-43c7-aa98-8f1e5250dab3: One or more ports have an IP allocation from this subnet.

UPDATE_FAILED  resources.Networks: Conflict: 
resources.InternalNetwork.resources.InternalApiSubnet: Unable to complete operation on subnet 8c4ba2d9-cbb1-4398-a024-c405a16daf3a: One or more ports have an IP allocation from this subnet.

Comment 2 Jeremy 2018-11-20 21:57:45 UTC
This problem usually happens when customers attempt scale because the doc says to use --compute-scale:

##osp10
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/director_installation_and_usage/index#sect-Scaling_the_Overcloud
$ openstack overcloud deploy --templates --compute-scale 5 [OTHER_OPTIONS]
IMPORTANT
Make sure to include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.


#osp13
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#sect-Adding_Compute_or_Ceph_Storage_Nodes
(undercloud) $ openstack overcloud deploy --templates -e /home/stack/templates/node-info.yaml [OTHER_OPTIONS]
IMPORTANT
Make sure to include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.


I realize --compute-scale is depricated by ComputeCount, that helps , however customers can still hit this issue even with the IMPORTANT message telling them not to..

Comment 6 Jeremy 2019-04-11 17:08:51 UTC
This issue also happens with an empty resource_registry in environment files: https://access.redhat.com/solutions/4055621

Comment 7 pweeks 2020-12-02 20:40:09 UTC

*** This bug has been marked as a duplicate of bug 1538803 ***