Bug 1396227

Summary: [RFE] [Docs] Reduce Unintended Rebuilds of Overcloud nodes by better documenting impacts
Product: Red Hat OpenStack Reporter: Bradford Nichols <bradnichols>
Component: documentationAssignee: Dan Macpherson <dmacpher>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: medium Docs Contact:
Priority: low    
Version: 13.0 (Queens)CC: aschultz, bradnichols, ccopello, dbecker, jcoufal, jthomas, mburns, morazi, rhel-osp-director-maint, srevivo
Target Milestone: zstreamKeywords: FutureFeature
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-20 16:55:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bradford Nichols 2016-11-17 18:22:30 UTC
During the troubleshooting of the initial failed scale-out attempt by and the subsequent rebuilding of a cloud,  it is clear that it is very easy for an operator to cause unintended overcloud node rebuilds. At best this can cause outages as nodes are taken off line and rebuilt, at worst they result in unintended or broken configurations.
 
a. Modify documentation to describe in more detail consequences and behaviour of overcloud modify operations. 

The current v9 documentation says little about the possible nature, scope and impact of overcloud modifications. 
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage#sect-Modifying_the_Overcloud_Environment
“…. The director checks the overcloud stack in heat, and then updates each item in the stack with the environment files and heat templates. It does not recreate the Overcloud, but rather changes the existing Overcloud. …“


b. Provide a description of changes that will be made and the end result when deploy and update operations are invoked
For director operations which will change the overcloud configuration, provide a report of changes that will be made and why, and then the overall result against these changes when complete. 


c. Provide a --dry-run flag  which would report on syntax and planned changes
For director operations which will change the overcloud configuration,  provide a dry-run flag whuch report of changes that will be made and why and any errors in syntax and arguments.


d. Provide a ‘do you really want to do this’ user question
For director operations which will change the overcloud configuration, by default prompt the user for an interactive confirmation before proceeding as is implemented in the stack delete operation.


e. Provide a global ‘lock down your overcloud’ feature/setting
As an integrated feature or as a externally documented procedure, provide a way to prohib major overcloud changes. Externally documented methods could do something with IPMI passwords or PXE network traffic to prevent redeployments.

Comment 1 Jon Thomas 2017-06-27 16:38:02 UTC
As per the meeting, I have split out the original bz:

-dry run parts B and C:

https://bugzilla.redhat.com/show_bug.cgi?id=1465577

-‘do you really want to do this’ user question parts D:

https://bugzilla.redhat.com/show_bug.cgi?id=1465574

-lock down parts E:

https://bugzilla.redhat.com/show_bug.cgi?id=1465569

Comment 2 Jaromir Coufal 2017-07-24 17:34:30 UTC
Docs effort only for this BZ:

a. Modify documentation to describe in more detail consequences and behaviour of overcloud modify operations. 

The current v9 documentation says little about the possible nature, scope and impact of overcloud modifications. 
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage#sect-Modifying_the_Overcloud_Environment
“…. The director checks the overcloud stack in heat, and then updates each item in the stack with the environment files and heat templates. It does not recreate the Overcloud, but rather changes the existing Overcloud. …“


b. Provide a description of changes that will be made and the end result when deploy and update operations are invoked
For director operations which will change the overcloud configuration, provide a report of changes that will be made and why, and then the overall result against these changes when complete.

Comment 3 Red Hat Bugzilla Rules Engine 2017-07-24 17:35:34 UTC
This bugzilla has been removed from the release since it has not been triaged, and needs to be reviewed for targeting another release.

Comment 4 Dan Macpherson 2018-03-19 07:19:41 UTC
Scoping old BZ.

Brad - In terms of a) and b), how specific should we go in terms of defining changes? Should we talk mostly about configuration stages (i.e. Step 1 to 5)?

I guess, I'm looking for a bit more detail on what things we should be saying to customers in this regard?

Comment 5 Bradford Nichols 2018-04-02 18:59:40 UTC
Dan, 
Errors to warn people against stem from not really knowing what is in the yaml files before they launch a redeploy. Could be no configuration control , confusion, bad edits etc. Major two examples I'm aware of are:
- they thought they were scaling out and didn't realize the yaml networking section was completely different from what was currently deployed -  so all networks/communication breaks w/redeploy
- they thought they where updating networking but use either stale commandline flags or nodeinfo.yaml and do a drastic scale down of overcloud nodes to some typical hello world baseline 3 controllers and 2 computes
Perhaps you could use the above examples with a highlevel statement warning like - "the yaml files that define your cloud are numerous and complex. before you redeploy your cloud you need to ensure you understand the exact differences from the last deployment or you could get unexpected and disruptive consequences. For example ...."

Comment 6 Dan Macpherson 2018-08-06 04:25:49 UTC
Scoping old bugs. This BZ might still be useful to implement. It's work adding the suggestion Brad identified in comment #5 in the following section:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/chap-performing_tasks_after_overcloud_creation#sect-Modifying_the_Overcloud_Environment

Comment 7 Chuck Copello 2019-11-20 16:55:00 UTC
Closing; warning notes added per comment 6.