Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2129763

Summary:	[cee/sd][ceph-ansible][RFE] Additional pre check required prior to remove legacy RGW daemons in the cephadm-adopt playbook
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Prasanth M V <pmv>
Component:	Ceph-Ansible	Assignee:	Teoman ONAY <tonay>
Status:	CLOSED WONTFIX	QA Contact:	Aditya Ramteke <aramteke>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	5.2	CC:	aschoen, ceph-eng-bugs, gmeno, msaini, nthomas, sostapov
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	5.3z5
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-08-02 13:33:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Prasanth M V 2022-09-26 09:28:29 UTC

Description of problem:

While adopting RGW daemons to cephadm by running the "cephadm-adopt" playbook in the RHCS 5 upgrade process:
  - The first ansible PLAY executing in the playbook is "[redeploy rgw daemons]"
  - In this play, new RGW daemons will be created in a TASK called "[update the placement of radosgw hosts]" 
  - After this the next ansible PLAY is "[stop and remove legacy ceph rgw daemons]"
  - In this PLAY the legacy RGW daemon will be removed by the following TASKs in the playbook:

    TASK [stop and disable ceph-radosgw systemd service] ******************************************************   1st Task
    TASK [stop and disable ceph-radosgw systemd target] *******************************************************   2nd Task
    TASK [reset failed ceph-radosgw systemd unit] *************************************************************   3rd Task
    TASK [remove ceph-radosgw systemd files] ******************************************************************   4th Task
    TASK [remove legacy ceph radosgw data] ********************************************************************   5th Task
    TASK [remove legacy ceph radosgw directory] ***************************************************************   6th Task

  - This removal procedure must be executed for the adoption of RGW daemons to cephadm as the PORTs should be free for the deployment of new RGW daemons otherwise the deployment of new RGW daemons will be failed with the error that the PORT is already occupied.

  - But it is also important to check that the new RGW daemons are deployed and managed by cephadm before removing the legacy daemons completely (In 4/5/6th Tasks).

  - In a scenario, where the new RGW daemons didn't deploy or were not managed by cephadm due to any reasons; and in the next steps (In 4/5/6th Tasks) the legacy RGW daemons will be removed completely then there is a chance for an impact in production as there no RGW daemons (either legacy or new) aren't deployed in the cluster.

  - Considering the fact that the legacy daemon should be removed to release the PORT number for the new RGW daemon I would suggest adding a step/task to check whether the new RGW daemons are deployed or not before removing the legacy RGW daemons completely.

  - My suggestion is to stop&disable the systemd service/target first (In 1st/2nd Task), this can free the PORTs for newly creating RGW daemons; After this TASKs add a step/task to check whether the new RGW daemons are deployed and managed by cephadm(New steps to check the RGWs are up and running). Once got the confirmation then can proceed with the remaining removal TASK of the legacy daemon.



Version-Release number of selected component (if applicable):

- Red Hat Ceph Storage 5.2
- ceph-ansible-6.0.27.9-1.el8cp.noarch
- cephadm-16.2.8-85.el8cp.noarch

How reproducible:
- Not applicable

Comment 9 Scott Ostapovicz 2023-02-06 17:17:39 UTC

 Missed the 5.3 z1 window.  Moving to 6.1.  Please advise if this is a problem.

Comment 10 Scott Ostapovicz 2023-06-14 16:02:01 UTC

Missed the 5.3 z4 deadline.  Moving from z4 to z5.