Bug 2094273 - [DDF] Add an example here, because it's not clear what is the syntax of "failed-osd-id1". It's the number only of the
Summary: [DDF] Add an example here, because it's not clear what is the syntax of "fail...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.10
Hardware: All
OS: All
unspecified
unspecified
Target Milestone: ---
: ODF 4.10.5
Assignee: Agil Antony
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On: 1995530 2094271 2094272 2128435
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-07 10:26 UTC by Agil Antony
Modified: 2023-08-09 16:43 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2094272
Environment:
Last Closed: 2023-03-09 12:47:11 UTC
Embargoed:


Attachments (Terms of Use)

Comment 6 Oded 2022-07-18 14:13:11 UTC
I tested multiple osds replacements on Vmware LSO cluster:

On section 
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0,1 FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -

on Section 2.4.1. on step 18, We need to add  FORCE_OSD_REMOVAL flag


https://dxp-docp-prod.apps.ext-waf.spoke.prod.us-west-2.aws.paas.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/replacing_nodes/openshift_data_foundation_deployed_using_local_storage_devices#replacing_storage_nodes_on_vmware_infrastructure

SetUp:
OCP Version: 4.11.0-0.nightly-2022-07-16-020951
ODF Version: 4.11.0-113
LSO Version: local-storage-operator.4.11.0-202207121147

Test Prcoess:
$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
rook-ceph-osd-0-7f557d75d-xggxv    2/2     Running   0          78m   10.129.2.22   compute-0   <none>           <none>
rook-ceph-osd-1-759bb46bc6-wth4l   2/2     Running   0          78m   10.128.2.45   compute-1   <none>           <none>
rook-ceph-osd-2-5bb4c984c7-zzm57   2/2     Running   0          78m   10.131.0.32   compute-2   <none>           <none>

Delete osd 0,1:
$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
NAME                               READY   STATUS             RESTARTS      AGE   IP            NODE        NOMINATED NODE   READINESS GATES
rook-ceph-osd-0-7f557d75d-xggxv    1/2     CrashLoopBackOff   2 (6s ago)    82m   10.129.2.22   compute-0   <none>           <none>
rook-ceph-osd-1-759bb46bc6-wth4l   1/2     CrashLoopBackOff   1 (18s ago)   82m   10.128.2.45   compute-1   <none>           <none>
rook-ceph-osd-2-5bb4c984c7-zzm57   2/2     Running            0             82m   10.131.0.32   compute-2   <none>           <none>


$ oc scale -n openshift-storage deployment rook-ceph-osd-0 --replicas=0
deployment.apps/rook-ceph-osd-0 scaled
$ oc scale -n openshift-storage deployment rook-ceph-osd-1 --replicas=0
deployment.apps/rook-ceph-osd-1 scaled

$ oc get -n openshift-storage pods -l ceph-osd-id=0
NAME                              READY   STATUS        RESTARTS   AGE
rook-ceph-osd-0-7f557d75d-xggxv   0/2     Terminating   4          84m

$ oc get -n openshift-storage pods -l ceph-osd-id=1
NAME                               READY   STATUS        RESTARTS   AGE
rook-ceph-osd-1-759bb46bc6-wth4l   0/2     Terminating   3          84m


$ oc delete -n openshift-storage pod rook-ceph-osd-0-7f557d75d-xggxv --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "rook-ceph-osd-0-7f557d75d-xggxv" force deleted
$ oc delete -n openshift-storage pod rook-ceph-osd-1-759bb46bc6-wth4l --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "rook-ceph-osd-1-759bb46bc6-wth4l" force deleted

$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=0,1 FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
  
$ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage 
NAME                        READY   STATUS      RESTARTS   AGE
ocs-osd-removal-job-dfdmr   0/1     Completed   0          92s

$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 | egrep -i 'completed removal'
2022-07-18 13:43:19.207129 I | cephosd: completed removal of OSD 0
2022-07-18 13:43:25.063755 I | cephosd: completed removal of OSD 1

$ oc get pv -L kubernetes.io/hostname | grep localblock | grep Released
local-pv-5c3e0bc0                          100Gi      RWO            Delete           Released   openshift-storage/ocs-deviceset-localblock-0-data-0svf5w   localblock                             93m   compute-0
local-pv-a76136ea                          100Gi      RWO            Delete           Released   openshift-storage/ocs-deviceset-localblock-0-data-29plq4   localblock                             93m   compute-1


$ oc delete pv local-pv-5c3e0bc0 
persistentvolume "local-pv-5c3e0bc0" deleted
$ oc delete pv local-pv-a76136ea
persistentvolume "local-pv-a76136ea" deleted


$ oc get pods | grep osd
rook-ceph-osd-0-6dbb477cc7-tggf9                                  2/2     Running     0              80s
rook-ceph-osd-1-6d69d57d84-qfc5n                                  2/2     Running     0              79s
rook-ceph-osd-2-5bb4c984c7-zzm57                                  2/2     Running     0              94m
rook-ceph-osd-prepare-007401b8286106910c461cc5d73d9687-rvwt8      0/1     Completed   0              6m47s
rook-ceph-osd-prepare-e6e56ed144a9c3ed7d6873038aa03aee-6cs4b      0/1     Completed   0              6m46s
rook-ceph-osd-prepare-f2a2313c490e4d5c1d127f4f5c4e8141-5fkpq      0/1     Completed   0              95m

Comment 9 Oded 2022-07-21 10:50:04 UTC
I tested the node replacement procedure on LSO cluster [vmware] with wide encryption. 
We can add a new step "Add a new disk to new worker node" before step 16 [Verify that the new localblock PV is available.]



https://docs.google.com/document/d/1m720IElmcnqLMW_iNcSrx75tatFT7X2yMhYB7NZxhXc/edit

Comment 10 Oded 2022-07-21 11:47:30 UTC
https://dxp-docp-prod.apps.ext-waf.spoke.prod.us-west-2.aws.paas.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html-single/replacing_nodes/index?lb_target=preview#replacing-an-operational-node-using-local-storage-devices_vmware-upi-operational

Bug Fixed.
1. Added FORCE_OSD_REMOVAL flag:
for example:
oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=1 FORCE_OSD_REMOVAL=true | oc create -f -
2. <failed_osd_id> string fixed.


Note You need to log in before you can comment on or make changes to this bug.