1419117 – [Documentation] [Update/Upgrade] [OSP8/9/10] : what would be the safe way to run reboot cycle of the ceph nodes - post update/upgrade

Bug 1419117 - [Documentation] [Update/Upgrade] [OSP8/9/10] : what would be the safe way to run reboot cycle of the ceph nodes - post update/upgrade

Summary: [Documentation] [Update/Upgrade] [OSP8/9/10] : what would be the safe way to ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	documentation
Sub Component:
Version:	8.0 (Liberty)
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	async
Target Release:	---
Assignee:	Dan Macpherson
QA Contact:	RHOS Documentation Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-03 15:37 UTC by Omri Hochman
Modified:	2017-05-17 12:30 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-05-17 12:30:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Omri Hochman 2017-02-03 15:37:33 UTC

[Documentation] [Update/Upgrade] [OSP8/9/10] : what would be the safe way to run reboot cycle of the ceph nodes -  post update/upgrade    

From a client call , we got a question regarding what would be the safe way to run reboot cycle of the ceph nodes -  post update/upgrade   

The Ceph engineers  recommended to set the following flags and run this order: 
(1) ceph osd set noout 
(2) ceph osd set norebalance
(3) reboot ceph nodes one by one
(4) wait between the reboot of nodes till pgs are back to normal  

It was not documented in OSP8 : https://access.redhat.com/documentation/en/red-hat-openstack-platform/8/single/upgrading-red-hat-openstack-platform/

Comment 1 Giulio Fidente 2017-02-03 15:46:52 UTC

also remember to re-enable those after all nodes are back aht pgs are back to normal with:

ceph osd unset noout
ceph osd unset norebalance

Comment 2 Giulio Fidente 2017-02-03 16:33:50 UTC

we should also wait for the pgs to go back to normal (all active+clean) after each storage node is back up, not only at the end; so the list could be changed in:

(1) ceph osd set noout 
(2) ceph osd set norebalance
(3) reboot one ceph-storage node
(4) after reboot monitor ceph cluster status to ensure pgs are back to normal
... repeat steps 3 and 4 with the next node
(5) ceph osd unset noout
(6) ceph osd unset norebalance

Comment 3 Dan Macpherson 2017-03-13 05:20:49 UTC

Implemented this content in the following guides:

Director Guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/director_installation_and_usage/sect-rebooting_the_overcloud#sect-Rebooting-Ceph

Upgrade Guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Ceph

Ceph Storage Guide: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/red_hat_ceph_storage_for_the_overcloud/creation#rebooting_the_environment

@Omri and Giulio -- Did you guys have any suggestions for improvements for this content?

Comment 4 Giulio Fidente 2017-03-13 12:14:36 UTC

hi Dan, the instructions look good to me, thanks!

I want to leave a comment regarding step 1) of the reboot process, where it says:

  Select the first Ceph Storage node to reboot and log into it. 

while the above *will* work fine, for better compatibility with the future releases we might prefer tell people to log on one of the *controller* nodes to give the "ceph osd set ..." and "ceph osd unset ..." commnds, instead of the first storage node. In the future the storage nodes might not have the necessary permissions to run that command which affects the entire cluster; controllers (or better, the nodes running ceph-mon) will always have. Not sure if there is time/resources to change that?

Comment 6 Dan Macpherson 2017-05-16 04:21:40 UTC

Hi Giulio,

Have implemented the suggestion in comment #4:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html-single/director_installation_and_usage/#sect-Rebooting-Ceph

How does it look now?

Comment 7 Giulio Fidente 2017-05-17 09:50:57 UTC

Perfect, thanks for the update!

Note You need to log in before you can comment on or make changes to this bug.