Bug 1651805 - [RFE] Add protection against users running overcloud deploy without all original environment files
Summary: [RFE] Add protection against users running overcloud deploy without all origi...
Keywords:
Status: CLOSED DUPLICATE of bug 1538803
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-20 21:56 UTC by Jeremy
Modified: 2022-03-13 16:58 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-02 20:40:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13837 0 None None None 2022-03-13 16:58:47 UTC

Description Jeremy 2018-11-20 21:56:19 UTC
Description of problem:

  Currently users can easily destroy their overcloud by running a deploy without original environment files. Specifically network environment files. If a deploy has isolated networks and a subsequent deploy,scale,update is ran without all the original network environment files then the overcloud can be destroyed. This happens because without the network environment files that specify isolated networks, the deployment will put everything on controlplane and attempt to delete isolated networks.
  This problem can also occur if users are using a script to deploy and add a new line to the script without the \ . This will cause the script not to use any environment files after the mistake. Given all these scenerios are user mistakes , we need to make sure it's not so easy to destroy an environment. 



Version-Release number of selected component (if applicable):
All osp versions

How reproducible:
unknown

Steps to Reproduce:
1.deploy overcloud with isolated networks
2.re-run deploy command without network environmetn files
3.

Actual results:
overcloud destroyed

Expected results:
Warning message or something to alert users they need to include original deployment network environment files
Additional info:

Currently the only safegaurd against this is the fact that once the deployment attempts to delete the overcloud networks it can not since there are still neutron ports on the network. So deploy fails with unable to delete netork errors such as:
[overcloud-Networks-impwtkh5lxyq-ExternalNetwork-3373wgx7a2nm]: DELETE_FAILED  Resource DELETE failed: Conflict: resources.ExternalSubnet: Unable to complete operation on subnet f2972d18-95a8-43c7-aa98-8f1e5250dab3: One or more ports have an IP allocation from this subnet.

UPDATE_FAILED  resources.Networks: Conflict: 
resources.InternalNetwork.resources.InternalApiSubnet: Unable to complete operation on subnet 8c4ba2d9-cbb1-4398-a024-c405a16daf3a: One or more ports have an IP allocation from this subnet.

Comment 2 Jeremy 2018-11-20 21:57:45 UTC
This problem usually happens when customers attempt scale because the doc says to use --compute-scale:

##osp10
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/director_installation_and_usage/index#sect-Scaling_the_Overcloud
$ openstack overcloud deploy --templates --compute-scale 5 [OTHER_OPTIONS]
IMPORTANT
Make sure to include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.


#osp13
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#sect-Adding_Compute_or_Ceph_Storage_Nodes
(undercloud) $ openstack overcloud deploy --templates -e /home/stack/templates/node-info.yaml [OTHER_OPTIONS]
IMPORTANT
Make sure to include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.


I realize --compute-scale is depricated by ComputeCount, that helps , however customers can still hit this issue even with the IMPORTANT message telling them not to..

Comment 6 Jeremy 2019-04-11 17:08:51 UTC
This issue also happens with an empty resource_registry in environment files: https://access.redhat.com/solutions/4055621

Comment 7 pweeks 2020-12-02 20:40:09 UTC

*** This bug has been marked as a duplicate of bug 1538803 ***


Note You need to log in before you can comment on or make changes to this bug.