Bug 2212247

Summary: Add steps to upgrade RHCS 5 to RHCS 6 involving RHEL 8 to RHEL 9 upgrades with Stretch mode enabled.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Pawan <pdhiran>
Component: DocumentationAssignee: Akash Raj <akraj>
Documentation sub component: Administration Guide QA Contact: Pawan <pdhiran>
Status: CLOSED CURRENTRELEASE Docs Contact: Anjana Suparna Sriram <asriram>
Severity: high    
Priority: unspecified CC: cephqe-warriors, msaini, rmandyam, saraut, vpapnoi
Version: 6.1   
Target Milestone: ---   
Target Release: 6.1z1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-28 07:31:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pawan 2023-06-05 03:00:01 UTC
Describe the issue:
Add steps to upgrade RHCS 5 to RHCS 6 involving RHEL 8 to RHEL 9 upgrades with Stretch mode enabled.

Steps involved:
=====
Pre-reqs : 
 - RHCS 5 on RHEL 8 with necessary hosts and daemons running with stretch mode enabled.
 - Take backup of ceph binary (/usr/sbin/cephadm), ceph.pub(/etc/ceph) & the Ceph cluster’s public SSH keys from the admin node.

Procedure.
1. Add a second admin node in the cluster (To manage the cluster when admin node re-provisioned).
   -> ceph orch host label add <hostname> _admin

2. Set the  “noout” flag.
   -> ceph osd set noout

Steps for Removing a host from the cluster, follow [2].

3. Drain one host. 
   -> Ceph orch host drain <hostname> --force

4. If the hosts being drained have OSDs present, Zap the devices ( So that they can be used to re-deploy OSDs once the host is added back )
   -> ceph orch device zap <hostname> <disk> --force

5. Remove the host from the cluster.
   -> ceph orch host rm <hostname> --force


6. Re-provision respective host from RHEL 8 to RHEL 9 following the guide [3].

7. After host upgrade,Install regular dependencies for ceph(lvm2, podman, chrony, python3) on the host.


Steps for Adding a host to cluster, follow [1].

8. Add/ copy “ceph.pub” key on the re-provisioned node.
  -> ssh-copy-id -f -i ~/PATH root@<hostname>

8.1. If the removed host has a mon daemon, then, before adding the host to the cluster, add the unmanaged flag to mon deployment.
  -> ceph orch apply mon <placement> --unmanaged

9. Add host again to the cluster. Add the labels present earlier.
  -> ceph orch host add <host> <IP> --labels=<labels>

9.1. If the removed host had a mon daemon deployed originally, the mon daemon needs to be added back manually with the location attributes. follow doc [4]
  -> # ceph mon add <hostname> <IP> <location>
eg: ceph mon add ceph-pdhiran-4xtsy5-node8 10.0.211.62 datacenter=DC2

  -> # ceph orch daemon add mon <hostname>
eg: ceph orch daemon add mon ceph-pdhiran-4xtsy5-node8

Note: Doc [4] speaks about replacement, but we just need to add a new mon. A new section can be added for the same. ( adding new data site mon )

10. Verify the daemons on re-provision host running successfully with the same ceph version.
  -> Ceph orch ps
11. Set back the mon daemon placement to managed.
  -> ceph orch apply mon <placement>

11. Repeat same with all other hosts as well except the Arbiter node.
Arbiter mon cannot be drained from the host, and cannot be removed. For this purpose, we would need to re-provision the arbiter mon to another node, and then drain, rm the host. Follow doc [4].

13. Follow the same approach to re-provision admin nodes and meantime use a second admin node to manage clusters.
14. Add the backup files again to the node.
15. Add admin nodes again to cluster using the second admin node. Set the mon deployment to unmanaged.
16. follow doc [4] to add back the old arbiter mon and remove the temp mon created earlier.
17. Unset the  “noout” flag.
  -> ceph osd unset noout
18. Verify ceph version,cluster status is healthy with all demons working as expected after RHEL upgrade.
19. Follow the doc [5] to perform RHCS 5 to RHCS 6 Upgrade.


Docs : 
[1]. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/6/html-single/operations_guide/index#adding-hosts-using-the-ceph-orchestrator_ops 
[2]. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/6/html-single/operations_guide/index#removing-hosts-using-the-ceph-orchestrator_ops
[3]. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/upgrading_from_rhel_8_to_rhel_9/index 
[4]. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/troubleshooting_guide/troubleshooting-clusters-in-stretch-mode#replacing-the-tiebreaker-with-a-new-monitor_diag 
[5]. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/6/html/upgrade_guide/upgrade-a-red-hat-ceph-storage-cluster-using-cephadm#doc-wrapper 

Describe the task you were trying to accomplish:
Performing Stretch mode RHCS upgrade involving RHEL 8 to RHEL 9 upgrade. This would be different from normal cluster upgrade as this would involve manually adding and removing mons with location attributes.

Suggestions for improvement:
Add a new section for RHCS stretched cluster upgrade with RHEL 8 to 9 migration.

Document URL:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/administration_guide/index#stretch-clusters-for-ceph-storage

Chapter/Section Number and Title:

Product Version:
6.1

Environment Details:
RHEL 8 to RHEL 9 upgrade with RHCS 5 to RHCS 6 upgrade.

Any other versions of this document that also needs this update:

Additional information:

The steps for general ceph upgrade referred from : https://bugzilla.redhat.com/show_bug.cgi?id=2142958#c0