Bug 2043436

Summary: [CEPH3][DOC] Missing Ceph MON repleacing procedure
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rafal Szmigiel <rszmigie>
Component: DocumentationAssignee: Ranjini M N <rmandyam>
Documentation sub component: Operations Guide QA Contact: Veera Raghava Reddy <vereddy>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: low    
Priority: unspecified CC: asriram, jveiraca, rmandyam
Version: 3.3Keywords: NoDocsQEReview
Target Milestone: ---   
Target Release: Backlog   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-07 07:09:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rafal Szmigiel 2022-01-21 09:10:37 UTC
Describe the issue:

There is no documented MON replacement for Ceph 3 in the RH docs as we have it available for Ceph 4 (https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/troubleshooting_guide/index#replacing-a-failed-monitor_diag).

We have the procedure to remove MON (https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/operations_guide/index#removing-a-ceph-monitor-using-the-command-line-interface-ops) and we refer it from https://access.redhat.com/solutions/5000261 guiding customers to follow it in order to redeploy failed MON on RHOSP13 controller node.

The problem is that in remove MON procedure for Ceph 3, doc says that deleting /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID data is an optional step ("11. Optionally, you can delete the monitor data:"). This won't let director to re-deploy MON on the node (ceph-mon container won't be created). In a result customer who follows RH docs literally will end up with unhealthy Ceph cluster running 2 out of 3 MONs. 

Describe the task you were trying to accomplish:

Reinstalling failed Ceph3 MON on OSP13 controller node following RH documentation.

Suggestions for improvement:

We should either update Ceph 3 documentation and add MON replacement procedure which would include deleting of monitor data as it is for Ceph 4 (https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/troubleshooting_guide/index#replacing-a-failed-monitor_diag).

OR

update https://access.redhat.com/solutions/5000261 article to warn customer that deletion of /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID on the controller node with failed Ceph MON is mandatory, not optional.

Document URL:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/operations_guide/index#removing-a-ceph-monitor-using-the-command-line-interface-ops

https://access.redhat.com/solutions/5000261


Chapter/Section Number and Title:

Product Version:

Ceph 3

Environment Details:

Ceph 3 running on RHOSP13

Any other versions of this document that also needs this update:

Additional information: