| Summary: | Shutdown and Start up procedure request for OSP Director based setups | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Aviv Guetta <aguetta> |
| Component: | documentation | Assignee: | Dan Macpherson <dmacpher> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | RHOS Documentation Team <rhos-docs> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.0 (Liberty) | CC: | aguetta, brault, dbecker, dmacpher, jcoufal, jefbrown, jliberma, jslagle, lbopf, mburns, morazi, mschuppe, radoslaw.smigielski, rhel-osp-director-maint, rhos-docs, scohen, srevivo |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-05-15 13:57:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Aviv Guetta
2016-10-10 15:40:37 UTC
Content reviewed by Don Domingo and is now live. Here's the link to the general reboot procedures in the Director guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/director_installation_and_usage/sect-rebooting_the_overcloud And I've add the same reboot procedures at the end of each relevant upgrade procedure: Director - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Updating_Director_Packages Object Storage - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Swift Controller - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller Ceph Storage - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Ceph Compute - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Compute For each upgrade procedure, I have this text before the reboot: "Check the /var/log/yum.log file on the [ROLE] node you have upgraded to see if either the kernel or openvswitch packages have updated their major or minor versions. If so, perform a reboot of each node:" @Aviv -- How does the content look to you? Did you have any suggestions for improvements? Hi Dan, * [9.4] there should be a separator between the Overall description to steps to reboot the compute node. * [9.4] there should be a sanity check after the compute node is up. * [all] I don't understand why the procedure always states "Select the next node to reboot.", if nodes should be rebooted one-by-one, it should be explicitly stated. * [all] In order to avoid issues (like rebooting a faulty node, which will not start afterwards), there should be a sanity check also before the reboot. * [all] Another use for the procedure is to shutdown the compute node (and starting it up later), so adding 'poweroff' option should be considered as well. Aviv Hi Aviv, Thanks for the feedback. I might need some more information on these requests. Responses inline... (In reply to Aviv Guetta from comment #11) > Hi Dan, > * [9.4] there should be a separator between the Overall description to steps > to reboot the compute node. I'm not sure what you mean by a separator. Can you elaborate on this? > * [9.4] there should be a sanity check after the compute node is up. Sure, have you got a recommendation for a Compute sanity check? > * [all] I don't understand why the procedure always states "Select the next > node to reboot.", if nodes should be rebooted one-by-one, it should be > explicitly stated. I think you answered already that this one is done. > * [all] In order to avoid issues (like rebooting a faulty node, which will > not start afterwards), there should be a sanity check also before the reboot. What sort of pre-reboot sanity check were you thinking? I only ask because on of the reasons you could be rebooting a node is because there's a fault with the node. What did you want to check for? > * [all] Another use for the procedure is to shutdown the compute node (and > starting it up later), so adding 'poweroff' option should be considered as > well. I think the same processes can also apply to powering off nodes. Just instead of rebooting, you would just power off. So instead of a whole new procedure for power off, maybe a note to say the same procedures can be used but instead of rebooting to just power the node off. Would that make sense or is there more to it? (In reply to Dan Macpherson from comment #13) Hi Dan, > > * [9.4] there should be a separator between the Overall description to steps > > to reboot the compute node. > > I'm not sure what you mean by a separator. Can you elaborate on this? There are 3 steps which describe the process, then there is the practical part ('list compute nodes'), There should be a separator between them. > > * [9.4] there should be a sanity check after the compute node is up. > > Sure, have you got a recommendation for a Compute sanity check? > > * [all] I don't understand why the procedure always states "Select the next > > node to reboot.", if nodes should be rebooted one-by-one, it should be > > explicitly stated. > > I think you answered already that this one is done. Ack > > * [all] In order to avoid issues (like rebooting a faulty node, which will > > not start afterwards), there should be a sanity check also before the reboot. > > What sort of pre-reboot sanity check were you thinking? I only ask because > on of the reasons you could be rebooting a node is because there's a fault > with the node. What did you want to check for? It can be a shutdown as well. Additionally, an operator can reboot node for one reason and avoid other issues. Additionally, it should give the operator a good view of the current status of the node and the environment, before he does such an action. At first glance, i'd suggest: - [undercloud] Checking all computes status is ok: # [root@undercloud-0 ~]# openstack server list - [overcloud] Examine of the openstack services at the rebooted compute: # [heat-admin@compute-1 ~]$ sudo systemctl list-units "openstack*" "neutron*" "openvswitch*" ## this command can be changed according to the customer environment (in case of additional services). > > * [all] Another use for the procedure is to shutdown the compute node (and > > starting it up later), so adding 'poweroff' option should be considered as > > well. > > I think the same processes can also apply to powering off nodes. Just > instead of rebooting, you would just power off. So instead of a whole new > procedure for power off, maybe a note to say the same procedures can be used > but instead of rebooting to just power the node off. Would that make sense > or is there more to it? It should just be mentioned, alongside the reboot command step. Hi Dan, Additionally, there should be a separation between Overcloud and Undercloud, as Director is Undercloud and controllers / computes are overcloud. Currently there is one (wrong) title: "CHAPTER 9. REBOOTING THE OVERCLOUD" Hi Dan, I didn't receive any comments from the customer,as we did provide the documentation already, I think we can close this Bugzilla. Thanks, Aviv No prob, Aviv. I just switched back to ASSIGNED to take care of the feedback from comment #15. |