Red Hat Bugzilla – Bug 1248220
[Docs] [Director] Issues with overcloud package update documentation
Last modified: 2016-01-11 05:09:09 EST
A small grab-bag of issues with section 7.7 (Updating the Overcloud):
* The title is potentially misleading, since this section refers to updating packages installed on nodes that make up the overcloud, not to making changes to the Heat stack named 'overcloud'.
* The title of section 7.7.2 has been copied-and-pasted from elsewhere, and refers to deleting nodes instead of updating packages.
* The explanation of the -i option is misleading, as it is implied that without this option no breakpoints will be set. In fact, breakpoints are always set. With the -i option, the CLI remains in the foreground and prompts the user to clear the breakpoints at the appropriate times. Without it, the CLI exits immediately and the user is left to clear the breakpoints manually (which is both non-obvious and almost impossible to do at the correct times and in the correct sequence). Note that if the user forgets to pass -i or if the CLI process dies for any reason (e.g. Ctrl-C), the user can resume an in-progress update by running the command again (and passing -i). Changes to the upstream docs that landed today make this a little clearer than it was: https://repos.fedorapeople.org/repos/openstack-m/docs/master/post_deployment/package_update.html
Assigning to Dan for review.
The first two look good now, but the breakpoint part is still potentially misleading - this doesn't really describe the situation:
"The breakpoint process is usually automatic. However, the -i option provides an interactive mode that requires confirmation at each breakpoint."
The breakpoint process is automatic (with manual confirmations) when you specify the -i option.
When you don't specify the -i option, nothing happens at all. No packages are updated. The overcloud stack moves to the UPDATE_IN_PROGRESS state and stays there for several hours (holding a lock that prevents you from making other changes to the stack), until it finally times out and moves to UPDATE_FAILED without having ever attempted to update any packages. This is because the command line client, which is responsible for clearing the breakpoints, has exited, so Heat hits the first breakpoint and stops.
The user should always, always, always use -i, because that's the only way the breakpoints get cleared. If you accidentally run the command without -i (or you run it with -i but accidentally exit prematurely, e.g. with Ctrl-C), the fix is to run it again with -i - it will automatically detect the existing operation in progress and continue clearing breakpoints where it left off.
No prob, I can update this.
However, this also raises a question: Why have -i as an option if it's compulsory? If users should always include the -i, why not remove the -i option and make the -i happen automatically? Is there ever a situation where they wouldn't use -i?
Yep, I agree it was probably a mistake to make it an option.
How does this sound:
Running an update on all nodes in parallel might cause problems. For example, an update of a package might involve restarting a service, which can disrupt other nodes. This is why the update process updates each node using a set of breakpoints. This means nodes are updated one by one. When one node completes the package update, the update process moves on to the next node.
## IMPORTANT ##
The update process also requires the <option>-i</option> option. This puts the command an interactive mode that requires confirmation at each breakpoint. Without the <option>-i</option> option, the Overcloud nodes will not update their packages.
A lot better. Consider changing the last sentence to something like:
"Without the <option>-i</option> option, the update will remain paused at the first breakpoint. The update process can be resumed by running the command again with the <option>-i</option> option."
so that people know how to recover when they inevitably miss the option out.
Good idea. Adding that in.
This is now live:
Zane, how are the changes? Anything else in that section that needs a fix?
This content is live on the Customer Portal.