Bug 1392753 - [Docs [RFE] Document how to do a complete Ceph upgrade before compute nodes are upgraded
Summary: [Docs [RFE] Document how to do a complete Ceph upgrade before compute nodes a...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: ga
: 10.0 (Newton)
Assignee: Dan Macpherson
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-08 07:16 UTC by Giulio Fidente
Modified: 2018-08-07 13:48 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-07 04:27:46 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Giulio Fidente 2016-11-08 07:16:27 UTC
Currently the upgrade process starts from the controller nodes which can optionally host the CephMon service; if found, Ceph will be upgraded first.

The CephMon upgrade will set the CRUSH tunables to their 'default' for the release [1].

At the end of the controller nodes upgrade, we provide the instructions to move on with the upgrade of compute nodes and then ceph storage nodes.

With the current order of operations, the Ceph cluster upgrade can not be completed until all compute nodes are upgraded. Ceph supports rolling upgrade of the OSDs but it is recommended to upgrade all daemons to the same release to take advantage of new CRUSH tunables and diminish risks.

If desired, user can finish the Ceph cluster upgrade before the compute nodes are upgraded by upgrading the ceph storage nodes first. This should be an option documented in the upgrade guide.

1. http://docs.ceph.com/docs/master/rados/operations/crush-map/#tuning-crush

Comment 1 Giulio Fidente 2016-11-08 07:19:58 UTC
> If desired, user can finish the Ceph cluster upgrade before the compute nodes are upgraded by upgrading the ceph storage nodes first.

To be clear, with the above I meant upgrading the ceph storage nodes before the compute nodes, not before controllers, which shall alwasy be the first as it currently is.

Comment 2 Marios Andreou 2016-11-08 08:10:26 UTC
Assigning this to DFG:DF-Lifecycle for now, though this may ultimately be a Documentation bug. Thanks for filing gfidente, I'll add to scrum agenda for discussion this evening so we make sure to triage and process this properly

Comment 3 Jaromir Coufal 2016-11-08 17:04:42 UTC
Dan, we will test one more time with our QE, they will confirm here that it works and after that could you please update our upgrade docs that user can (but don't have to) switch compute and ceph upgrade steps?

Thanks
-- Jarda

Comment 7 seb 2016-11-09 11:11:30 UTC
Thanks!

Comment 8 Dan Macpherson 2016-11-09 11:16:59 UTC
(In reply to Jaromir Coufal from comment #3)
> Dan, we will test one more time with our QE, they will confirm here that it
> works and after that could you please update our upgrade docs that user can
> (but don't have to) switch compute and ceph upgrade steps?
> 
> Thanks
> -- Jarda

Sure. Does this only apply to OSP10, or will it also require a backport to OSP9?

Comment 9 Giulio Fidente 2016-11-09 11:54:50 UTC
hi Dan, this only applies to OSP(d) upgrades from 9 to 10.

Comment 10 Dan Macpherson 2016-11-09 15:18:10 UTC
ACK, thanks

Comment 11 Dan Macpherson 2016-11-28 03:05:46 UTC
Jarda and Giulio, do we have any results from the QE testing? What is the recommended order:

a) Ceph, then Compute
b) Compute, then Ceph
c) Either, then the other

Comment 17 Sofer Athlan-Guyot 2017-01-30 14:45:26 UTC
Hi,

The upstream documentation http://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html has been updated to reflect the change in order.

Comment 21 Dan Macpherson 2017-02-01 02:53:55 UTC
In any case, what I'll do keep the order to be compute, then ceph. But if Sofer's test is successful, I'll include a note for compute that says "You can upgrade Ceph first if you prefer".

Any objections to this?

Comment 22 Jaromir Coufal 2017-03-13 05:15:35 UTC
No objections.

Sofer, what was the result?

Comment 23 Sofer Athlan-Guyot 2017-03-21 09:04:33 UTC
Hi Jaromir, Dan,

hum ... I did not test it.  I just pointed out that upstream doc is in the gfidente order and that we should match it.  QE would be the best place to test it, I guess.

Comment 24 Sofer Athlan-Guyot 2017-05-30 10:59:39 UTC
Amit any news on this one.   I think that if we make the change QE should be ready to make it too.  Even If I make the test today and it's working, we know that upgrade/update is a moving target and if it's not coded in QE then it's bad.

IMHO we should match what usptream is documenting, so we should change QE testing order.   Here's what upstream doc is writing:


4. Upgrade ceph storage nodes

If the deployment has any ceph storage nodes, upgrade them one-by-one using the upgrade-non-controller.sh script on the undercloud node:

upgrade-non-controller.sh --upgrade <nova-id of ceph storage node>

5. Upgrade compute nodes

Upgrade compute nodes one-by-one using the upgrade-non-controller.sh script on the undercloud node:

upgrade-non-controller.sh --upgrade <nova-id of compute node>

in https://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html

Comment 25 Amit Ugol 2018-02-01 07:26:33 UTC
no. we test it in the exact same way as before.

Comment 26 Dan Macpherson 2018-08-07 04:27:46 UTC
Closing this BZ because I don't think we arrived at a conclusion here and I restructured the docs to go:

1. Controller
2. Ceph
3. Compute

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#upgrading_the_overcloud


Note You need to log in before you can comment on or make changes to this bug.