Bug 1396227 - [RFE] [Docs] Reduce Unintended Rebuilds of Overcloud nodes by better documenting impacts
Summary: [RFE] [Docs] Reduce Unintended Rebuilds of Overcloud nodes by better document...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: zstream
: 13.0 (Queens)
Assignee: Dan Macpherson
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-17 18:22 UTC by Bradford Nichols
Modified: 2019-11-20 16:55 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-20 16:55:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bradford Nichols 2016-11-17 18:22:30 UTC
During the troubleshooting of the initial failed scale-out attempt by and the subsequent rebuilding of a cloud,  it is clear that it is very easy for an operator to cause unintended overcloud node rebuilds. At best this can cause outages as nodes are taken off line and rebuilt, at worst they result in unintended or broken configurations.
 
a. Modify documentation to describe in more detail consequences and behaviour of overcloud modify operations. 

The current v9 documentation says little about the possible nature, scope and impact of overcloud modifications. 
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage#sect-Modifying_the_Overcloud_Environment
“…. The director checks the overcloud stack in heat, and then updates each item in the stack with the environment files and heat templates. It does not recreate the Overcloud, but rather changes the existing Overcloud. …“


b. Provide a description of changes that will be made and the end result when deploy and update operations are invoked
For director operations which will change the overcloud configuration, provide a report of changes that will be made and why, and then the overall result against these changes when complete. 


c. Provide a --dry-run flag  which would report on syntax and planned changes
For director operations which will change the overcloud configuration,  provide a dry-run flag whuch report of changes that will be made and why and any errors in syntax and arguments.


d. Provide a ‘do you really want to do this’ user question
For director operations which will change the overcloud configuration, by default prompt the user for an interactive confirmation before proceeding as is implemented in the stack delete operation.


e. Provide a global ‘lock down your overcloud’ feature/setting
As an integrated feature or as a externally documented procedure, provide a way to prohib major overcloud changes. Externally documented methods could do something with IPMI passwords or PXE network traffic to prevent redeployments.

Comment 1 Jon Thomas 2017-06-27 16:38:02 UTC
As per the meeting, I have split out the original bz:

-dry run parts B and C:

https://bugzilla.redhat.com/show_bug.cgi?id=1465577

-‘do you really want to do this’ user question parts D:

https://bugzilla.redhat.com/show_bug.cgi?id=1465574

-lock down parts E:

https://bugzilla.redhat.com/show_bug.cgi?id=1465569

Comment 2 Jaromir Coufal 2017-07-24 17:34:30 UTC
Docs effort only for this BZ:

a. Modify documentation to describe in more detail consequences and behaviour of overcloud modify operations. 

The current v9 documentation says little about the possible nature, scope and impact of overcloud modifications. 
https://access.redhat.com/documentation/en/red-hat-openstack-platform/9/single/director-installation-and-usage#sect-Modifying_the_Overcloud_Environment
“…. The director checks the overcloud stack in heat, and then updates each item in the stack with the environment files and heat templates. It does not recreate the Overcloud, but rather changes the existing Overcloud. …“


b. Provide a description of changes that will be made and the end result when deploy and update operations are invoked
For director operations which will change the overcloud configuration, provide a report of changes that will be made and why, and then the overall result against these changes when complete.

Comment 3 Red Hat Bugzilla Rules Engine 2017-07-24 17:35:34 UTC
This bugzilla has been removed from the release since it has not been triaged, and needs to be reviewed for targeting another release.

Comment 4 Dan Macpherson 2018-03-19 07:19:41 UTC
Scoping old BZ.

Brad - In terms of a) and b), how specific should we go in terms of defining changes? Should we talk mostly about configuration stages (i.e. Step 1 to 5)?

I guess, I'm looking for a bit more detail on what things we should be saying to customers in this regard?

Comment 5 Bradford Nichols 2018-04-02 18:59:40 UTC
Dan, 
Errors to warn people against stem from not really knowing what is in the yaml files before they launch a redeploy. Could be no configuration control , confusion, bad edits etc. Major two examples I'm aware of are:
- they thought they were scaling out and didn't realize the yaml networking section was completely different from what was currently deployed -  so all networks/communication breaks w/redeploy
- they thought they where updating networking but use either stale commandline flags or nodeinfo.yaml and do a drastic scale down of overcloud nodes to some typical hello world baseline 3 controllers and 2 computes
Perhaps you could use the above examples with a highlevel statement warning like - "the yaml files that define your cloud are numerous and complex. before you redeploy your cloud you need to ensure you understand the exact differences from the last deployment or you could get unexpected and disruptive consequences. For example ...."

Comment 6 Dan Macpherson 2018-08-06 04:25:49 UTC
Scoping old bugs. This BZ might still be useful to implement. It's work adding the suggestion Brad identified in comment #5 in the following section:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/chap-performing_tasks_after_overcloud_creation#sect-Modifying_the_Overcloud_Environment

Comment 7 Chuck Copello 2019-11-20 16:55:00 UTC
Closing; warning notes added per comment 6.


Note You need to log in before you can comment on or make changes to this bug.