Hide Forgot
Currently the upgrade process starts from the controller nodes which can optionally host the CephMon service; if found, Ceph will be upgraded first. The CephMon upgrade will set the CRUSH tunables to their 'default' for the release [1]. At the end of the controller nodes upgrade, we provide the instructions to move on with the upgrade of compute nodes and then ceph storage nodes. With the current order of operations, the Ceph cluster upgrade can not be completed until all compute nodes are upgraded. Ceph supports rolling upgrade of the OSDs but it is recommended to upgrade all daemons to the same release to take advantage of new CRUSH tunables and diminish risks. If desired, user can finish the Ceph cluster upgrade before the compute nodes are upgraded by upgrading the ceph storage nodes first. This should be an option documented in the upgrade guide. 1. http://docs.ceph.com/docs/master/rados/operations/crush-map/#tuning-crush
> If desired, user can finish the Ceph cluster upgrade before the compute nodes are upgraded by upgrading the ceph storage nodes first. To be clear, with the above I meant upgrading the ceph storage nodes before the compute nodes, not before controllers, which shall alwasy be the first as it currently is.
Assigning this to DFG:DF-Lifecycle for now, though this may ultimately be a Documentation bug. Thanks for filing gfidente, I'll add to scrum agenda for discussion this evening so we make sure to triage and process this properly
Dan, we will test one more time with our QE, they will confirm here that it works and after that could you please update our upgrade docs that user can (but don't have to) switch compute and ceph upgrade steps? Thanks -- Jarda
Thanks!
(In reply to Jaromir Coufal from comment #3) > Dan, we will test one more time with our QE, they will confirm here that it > works and after that could you please update our upgrade docs that user can > (but don't have to) switch compute and ceph upgrade steps? > > Thanks > -- Jarda Sure. Does this only apply to OSP10, or will it also require a backport to OSP9?
hi Dan, this only applies to OSP(d) upgrades from 9 to 10.
ACK, thanks
Jarda and Giulio, do we have any results from the QE testing? What is the recommended order: a) Ceph, then Compute b) Compute, then Ceph c) Either, then the other
Hi, The upstream documentation http://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html has been updated to reflect the change in order.
In any case, what I'll do keep the order to be compute, then ceph. But if Sofer's test is successful, I'll include a note for compute that says "You can upgrade Ceph first if you prefer". Any objections to this?
No objections. Sofer, what was the result?
Hi Jaromir, Dan, hum ... I did not test it. I just pointed out that upstream doc is in the gfidente order and that we should match it. QE would be the best place to test it, I guess.
Amit any news on this one. I think that if we make the change QE should be ready to make it too. Even If I make the test today and it's working, we know that upgrade/update is a moving target and if it's not coded in QE then it's bad. IMHO we should match what usptream is documenting, so we should change QE testing order. Here's what upstream doc is writing: 4. Upgrade ceph storage nodes If the deployment has any ceph storage nodes, upgrade them one-by-one using the upgrade-non-controller.sh script on the undercloud node: upgrade-non-controller.sh --upgrade <nova-id of ceph storage node> 5. Upgrade compute nodes Upgrade compute nodes one-by-one using the upgrade-non-controller.sh script on the undercloud node: upgrade-non-controller.sh --upgrade <nova-id of compute node> in https://docs.openstack.org/developer/tripleo-docs/post_deployment/upgrade.html
no. we test it in the exact same way as before.
Closing this BZ because I don't think we arrived at a conclusion here and I restructured the docs to go: 1. Controller 2. Ceph 3. Compute https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#upgrading_the_overcloud