Bug 1392753

Summary: [Docs [RFE] Document how to do a complete Ceph upgrade before compute nodes are upgraded
Product: Red Hat OpenStack Reporter: Giulio Fidente <gfidente>
Component: documentationAssignee: Dan Macpherson <dmacpher>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: augol, ccamacho, dbecker, gfidente, jcoufal, mandreou, mburns, morazi, ohochman, rhel-osp-director-maint, sathlang, seb, shan, srevivo
Target Milestone: gaKeywords: Documentation, FutureFeature, Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-07 04:27:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Giulio Fidente 2016-11-08 07:16:27 UTC
Currently the upgrade process starts from the controller nodes which can optionally host the CephMon service; if found, Ceph will be upgraded first.

The CephMon upgrade will set the CRUSH tunables to their 'default' for the release [1].

At the end of the controller nodes upgrade, we provide the instructions to move on with the upgrade of compute nodes and then ceph storage nodes.

With the current order of operations, the Ceph cluster upgrade can not be completed until all compute nodes are upgraded. Ceph supports rolling upgrade of the OSDs but it is recommended to upgrade all daemons to the same release to take advantage of new CRUSH tunables and diminish risks.

If desired, user can finish the Ceph cluster upgrade before the compute nodes are upgraded by upgrading the ceph storage nodes first. This should be an option documented in the upgrade guide.

1. http://docs.ceph.com/docs/master/rados/operations/crush-map/#tuning-crush

Comment 1 Giulio Fidente 2016-11-08 07:19:58 UTC
> If desired, user can finish the Ceph cluster upgrade before the compute nodes are upgraded by upgrading the ceph storage nodes first.

To be clear, with the above I meant upgrading the ceph storage nodes before the compute nodes, not before controllers, which shall alwasy be the first as it currently is.

Comment 2 Marios Andreou 2016-11-08 08:10:26 UTC
Assigning this to DFG:DF-Lifecycle for now, though this may ultimately be a Documentation bug. Thanks for filing gfidente, I'll add to scrum agenda for discussion this evening so we make sure to triage and process this properly

Comment 3 Jaromir Coufal 2016-11-08 17:04:42 UTC
Dan, we will test one more time with our QE, they will confirm here that it works and after that could you please update our upgrade docs that user can (but don't have to) switch compute and ceph upgrade steps?

Thanks
-- Jarda

Comment 7 seb 2016-11-09 11:11:30 UTC
Thanks!

Comment 8 Dan Macpherson 2016-11-09 11:16:59 UTC
(In reply to Jaromir Coufal from comment #3)
> Dan, we will test one more time with our QE, they will confirm here that it
> works and after that could you please update our upgrade docs that user can
> (but don't have to) switch compute and ceph upgrade steps?
> 
> Thanks
> -- Jarda

Sure. Does this only apply to OSP10, or will it also require a backport to OSP9?

Comment 9 Giulio Fidente 2016-11-09 11:54:50 UTC
hi Dan, this only applies to OSP(d) upgrades from 9 to 10.

Comment 10 Dan Macpherson 2016-11-09 15:18:10 UTC
ACK, thanks

Comment 11 Dan Macpherson 2016-11-28 03:05:46 UTC
Jarda and Giulio, do we have any results from the QE testing? What is the recommended order:

a) Ceph, then Compute
b) Compute, then Ceph
c) Either, then the other

Comment 17 Sofer Athlan-Guyot 2017-01-30 14:45:26 UTC
Hi,

The upstream documentation http://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html has been updated to reflect the change in order.

Comment 21 Dan Macpherson 2017-02-01 02:53:55 UTC
In any case, what I'll do keep the order to be compute, then ceph. But if Sofer's test is successful, I'll include a note for compute that says "You can upgrade Ceph first if you prefer".

Any objections to this?

Comment 22 Jaromir Coufal 2017-03-13 05:15:35 UTC
No objections.

Sofer, what was the result?

Comment 23 Sofer Athlan-Guyot 2017-03-21 09:04:33 UTC
Hi Jaromir, Dan,

hum ... I did not test it.  I just pointed out that upstream doc is in the gfidente order and that we should match it.  QE would be the best place to test it, I guess.

Comment 24 Sofer Athlan-Guyot 2017-05-30 10:59:39 UTC
Amit any news on this one.   I think that if we make the change QE should be ready to make it too.  Even If I make the test today and it's working, we know that upgrade/update is a moving target and if it's not coded in QE then it's bad.

IMHO we should match what usptream is documenting, so we should change QE testing order.   Here's what upstream doc is writing:


4. Upgrade ceph storage nodes

If the deployment has any ceph storage nodes, upgrade them one-by-one using the upgrade-non-controller.sh script on the undercloud node:

upgrade-non-controller.sh --upgrade <nova-id of ceph storage node>

5. Upgrade compute nodes

Upgrade compute nodes one-by-one using the upgrade-non-controller.sh script on the undercloud node:

upgrade-non-controller.sh --upgrade <nova-id of compute node>

in https://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html

Comment 25 Amit Ugol 2018-02-01 07:26:33 UTC
no. we test it in the exact same way as before.

Comment 26 Dan Macpherson 2018-08-07 04:27:46 UTC
Closing this BZ because I don't think we arrived at a conclusion here and I restructured the docs to go:

1. Controller
2. Ceph
3. Compute

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#upgrading_the_overcloud