Bug 2059027
| Summary: | Device Replacement with FORCE_OSD_REMOVAL, OSD moved to "destroyed" state. | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Oded <oviner> |
| Component: | rook | Assignee: | Travis Nielsen <tnielsen> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Oded <oviner> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | hnallurv, madam, mmuench, muagarwa, nberry, ocs-bugs, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.10.0-175 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-21 09:12:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Device Replacement with FORCE_OSD_REMOVAL, old OSD deleted
SetUp:
Provider: Vmware
OCP Versoin: 4.10.0-0.nightly-2022-03-05-023708
ODF Version: 4.10.0-177
Test Process:
1.Identify the OSD that needs to be replaced and the OpenShift Container Platform node that has the OSD scheduled on it.
$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-osd-0-6d87d76d7-p575f 2/2 Running 0 4h52m 10.129.2.26 compute-2 <none> <none>
rook-ceph-osd-1-79666595d9-srvp7 2/2 Running 0 4h52m 10.128.2.23 compute-0 <none> <none>
rook-ceph-osd-2-65c497cfc8-4sxnr 2/2 Running 0 4h52m 10.131.0.25 compute-1 <none> <none>
$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-osd-0-6d87d76d7-p575f 2/2 Running 0 4h54m 10.129.2.26 compute-2 <none> <none>
rook-ceph-osd-1-79666595d9-srvp7 1/2 CrashLoopBackOff 1 (9s ago) 4h54m 10.128.2.23 compute-0 <none> <none>
rook-ceph-osd-2-65c497cfc8-4sxnr 2/2 Running 0 4h54m 10.131.0.25 compute-1 <none> <none>
2.Scale down the OSD deployment for the OSD to be replaced.
$ osd_id_to_remove=1
$ oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
deployment.apps/rook-ceph-osd-1 scaled
3.Verify that the rook-ceph-osd pod is terminated.
$ oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
NAME READY STATUS RESTARTS AGE
rook-ceph-osd-1-79666595d9-srvp7 0/2 Terminating 3 4h55m
$ oc delete pod rook-ceph-osd-1-79666595d9-srvp7 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "rook-ceph-osd-1-79666595d9-srvp7" force deleted
$ oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
No resources found in openshift-storage namespace.
4.Remove the old OSD from the cluster so that you can add a new OSD.
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f -
job.batch/ocs-osd-removal-job created
5.Check the status of the ocs-osd-removal pod.
$ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
NAME READY STATUS RESTARTS AGE
ocs-osd-removal-job-8hdzv 1/1 Running 0 88s
$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1
2022-03-06 18:18:43.824586 I | rookcmd: starting Rook v4.10.0-0.4a36b5f4bbabe54c9dd2671886325a5771191b30 with arguments '/usr/local/bin/rook ceph osd remove --osd-ids=1 --force-osd-removal false'
2022-03-06 18:18:43.824742 I | rookcmd: flag values: --force-osd-removal=false, --help=false, --log-level=DEBUG, --operator-image=, --osd-ids=1, --preserve-pvc=false, --service-account=
2022-03-06 18:18:43.824747 I | op-mon: parsing mon endpoints: b=172.30.6.164:6789,c=172.30.75.171:6789,a=172.30.82.155:6789
2022-03-06 18:18:44.881080 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-03-06 18:18:44.881281 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-03-06 18:18:44.881395 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid = a983e53d-33d9-4c09-8d71-842aa48e2219
mon initial members = a b c
mon host = [v2:172.30.82.155:3300,v1:172.30.82.155:6789],[v2:172.30.6.164:3300,v1:172.30.6.164:6789],[v2:172.30.75.171:3300,v1:172.30.75.171:6789]
bdev_flock_retry = 20
mon_osd_full_ratio = .85
mon_osd_backfillfull_ratio = .8
mon_osd_nearfull_ratio = .75
mon_max_pg_per_osd = 600
mon_pg_warn_max_object_skew = 0
mon_data_avail_warn = 15
[osd]
osd_memory_target_cgroup_limit_ratio = 0.8
[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
2022-03-06 18:18:44.881443 D | exec: Running command: ceph osd dump --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:18:45.258092 I | cephosd: validating status of osd.1
2022-03-06 18:18:45.258151 I | cephosd: osd.1 is marked 'DOWN'
2022-03-06 18:18:45.258195 D | exec: Running command: ceph osd safe-to-destroy 1 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:18:45.639791 W | cephosd: osd.1 is NOT be ok to destroy, retrying in 1m until success
2022-03-06 18:19:45.640689 D | exec: Running command: ceph osd safe-to-destroy 1 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:19:45.995720 W | cephosd: osd.1 is NOT be ok to destroy, retrying in 1m until success
2022-03-06 18:20:45.995993 D | exec: Running command: ceph osd safe-to-destroy 1 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:20:46.381228 W | cephosd: osd.1 is NOT be ok to destroy, retrying in 1m until success
6.Delete the Job:
$ oc delete job ocs-osd-removal-job
job.batch "ocs-osd-removal-job" deleted
7.Run OSD Removal job with FORCE_OSD_REMOVAL
$ oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f -
job.batch/ocs-osd-removal-job created
8.Check the status of the ocs-osd-removal pod with FORCE_OSD_REMOVAL
$ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
NAME READY STATUS RESTARTS AGE
ocs-osd-removal-job-gpr9x 0/1 Completed 0 37s
$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1
2022-03-06 18:23:59.509428 I | rookcmd: starting Rook v4.10.0-0.4a36b5f4bbabe54c9dd2671886325a5771191b30 with arguments '/usr/local/bin/rook ceph osd remove --osd-ids=1 --force-osd-removal true'
2022-03-06 18:23:59.509543 I | rookcmd: flag values: --force-osd-removal=true, --help=false, --log-level=DEBUG, --operator-image=, --osd-ids=1, --preserve-pvc=false, --service-account=
2022-03-06 18:23:59.509546 I | op-mon: parsing mon endpoints: b=172.30.6.164:6789,c=172.30.75.171:6789,a=172.30.82.155:6789
2022-03-06 18:24:00.531096 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config
2022-03-06 18:24:00.531303 I | cephclient: generated admin config in /var/lib/rook/openshift-storage
2022-03-06 18:24:00.531426 D | cephclient: config file @ /etc/ceph/ceph.conf:
[global]
fsid = a983e53d-33d9-4c09-8d71-842aa48e2219
mon initial members = b c a
mon host = [v2:172.30.6.164:3300,v1:172.30.6.164:6789],[v2:172.30.75.171:3300,v1:172.30.75.171:6789],[v2:172.30.82.155:3300,v1:172.30.82.155:6789]
bdev_flock_retry = 20
mon_osd_full_ratio = .85
mon_osd_backfillfull_ratio = .8
mon_osd_nearfull_ratio = .75
mon_max_pg_per_osd = 600
mon_pg_warn_max_object_skew = 0
mon_data_avail_warn = 15
[osd]
osd_memory_target_cgroup_limit_ratio = 0.8
[client.admin]
keyring = /var/lib/rook/openshift-storage/client.admin.keyring
2022-03-06 18:24:00.531455 D | exec: Running command: ceph osd dump --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:00.997096 I | cephosd: validating status of osd.1
2022-03-06 18:24:00.997128 I | cephosd: osd.1 is marked 'DOWN'
2022-03-06 18:24:00.997142 D | exec: Running command: ceph osd safe-to-destroy 1 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:01.341574 I | cephosd: osd.1 is NOT be ok to destroy but force removal is enabled so proceeding with removal
2022-03-06 18:24:01.341629 D | exec: Running command: ceph osd find 1 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:01.714774 I | cephosd: marking osd.1 out
2022-03-06 18:24:01.714819 D | exec: Running command: ceph osd out osd.1 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:02.713567 I | cephosd: removing the OSD deployment "rook-ceph-osd-1"
2022-03-06 18:24:02.713605 D | op-k8sutil: removing rook-ceph-osd-1 deployment if it exists
2022-03-06 18:24:02.713612 I | op-k8sutil: removing deployment rook-ceph-osd-1 if it exists
2022-03-06 18:24:02.731191 I | op-k8sutil: Removed deployment rook-ceph-osd-1
2022-03-06 18:24:02.737131 I | op-k8sutil: "rook-ceph-osd-1" still found. waiting...
2022-03-06 18:24:04.747394 I | op-k8sutil: confirmed rook-ceph-osd-1 does not exist
2022-03-06 18:24:04.759802 I | cephosd: removing the osd prepare job "rook-ceph-osd-prepare-ocs-deviceset-2-data-07z25p"
2022-03-06 18:24:04.789368 I | cephosd: removing the OSD PVC "ocs-deviceset-2-data-07z25p"
2022-03-06 18:24:04.797281 I | cephosd: purging osd.1
2022-03-06 18:24:04.797322 D | exec: Running command: ceph osd purge osd.1 --force --yes-i-really-mean-it --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:05.222582 I | cephosd: attempting to remove host '\x01' from crush map if not in use
2022-03-06 18:24:05.222653 D | exec: Running command: ceph osd crush rm ocs-deviceset-2-data-07z25p --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:06.227372 D | exec: Running command: ceph crash ls --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json
2022-03-06 18:24:06.612621 I | cephosd: no ceph crash to silence
2022-03-06 18:24:06.612655 I | cephosd: completed removal of OSD 1
8.Check OSD pods status
$ oc get -n openshift-storage pods -l app=rook-ceph-osd
NAME READY STATUS RESTARTS AGE
rook-ceph-osd-0-6d87d76d7-p575f 2/2 Running 0 5h5m
rook-ceph-osd-1-67d9dcddf9-sw7qv 2/2 Running 0 117s
rook-ceph-osd-2-65c497cfc8-4sxnr 2/2 Running 0 5h5m
9.Check Ceph status:
sh-4.4$ ceph status
cluster:
id: a983e53d-33d9-4c09-8d71-842aa48e2219
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 3 daemons, quorum a,b,c (age 5h)
mgr: a(active, since 5h)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 4m), 3 in (since 5m)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 11 pools, 177 pgs
objects: 1.46k objects, 3.5 GiB
usage: 11 GiB used, 289 GiB / 300 GiB avail
pgs: 177 active+clean
io:
client: 1.7 KiB/s rd, 12 KiB/s wr, 2 op/s rd, 1 op/s wr
sh-4.4$ ceph crash ls
ID ENTITY NEW
2022-03-06T18:15:26.619459Z_f294c3ce-f5a0-48ed-9a7a-be7058ddfb01 osd.1 *
Delete crash item
sh-4.4$ ceph crash archive-all
Check ceph status
sh-4.4$ ceph health
HEALTH_OK
10.Check ceph osd tree:
sh-4.4$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29306 root default
-8 0.09769 rack rack0
-7 0.09769 host ocs-deviceset-2-data-0bzqhb
1 hdd 0.09769 osd.1 up 1.00000 1.00000
-12 0.09769 rack rack1
-11 0.09769 host ocs-deviceset-1-data-075jmc
2 hdd 0.09769 osd.2 up 1.00000 1.00000
-4 0.09769 rack rack2
-3 0.09769 host ocs-deviceset-0-data-0dtsbc
0 hdd 0.09769 osd.0 up 1.00000 1.00000
|
Description of problem (please be detailed as possible and provide log snippests): After OSD removal job failed, I added parameter FORCE_OSD_REMOVAL. The OSD removal job was completed but the OSD moved to "destroyed" state. Version of all relevant components (if applicable): OCP Version:4.10.0-0.nightly-2022-02-22-093600 ODF Version: 4.10.0-166 LSO Version: local-storage-operator.4.11.0-202202221716 Provider: Vmware Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Remove the old OSD from the cluster [without FORCE_OSD_REMOVAL] -> ocs-osd-removal-job pod stuck on Running state oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f 2022-02-23 09:10:05.259280 I | cephosd: osd.2 is marked 'DOWN' 2022-02-23 09:10:05.259296 D | exec: Running command: ceph osd safe-to-destroy 2 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 09:10:05.649466 W | cephosd: osd.2 is NOT be ok to destroy, retrying in 1m until success 2022-02-23 09:11:05.650412 D | exec: Running command: ceph osd safe-to-destroy 2 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2.Delete the ocs-osd-removal-job $ oc delete job ocs-osd-removal-job job.batch "ocs-osd-removal-job" deleted 3.Remove the old OSD from the cluster [with FORCE_OSD_REMOVAL] -> ocs-osd-removal-job moved to completed state. $ oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=${osd_id_to_remove} |oc create -n openshift-storage -f - job.batch/ocs-osd-removal-job created $ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage NAME READY STATUS RESTARTS AGE ocs-osd-removal-job-xvsj2 0/1 Completed 0 6m57s $ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 2022-02-23 22:49:18.141374 I | rookcmd: starting Rook v4.10.0-0.e43e46bc94063280e8d782b01674a68cacc4e8bc with arguments '/usr/local/bin/rook ceph osd remove --osd-ids=2 --force-osd-removal true' 2022-02-23 22:49:18.141476 I | rookcmd: flag values: --force-osd-removal=true, --help=false, --log-level=DEBUG, --operator-image=, --osd-ids=2, --preserve-pvc=false, --service-account= 2022-02-23 22:49:18.141481 I | op-mon: parsing mon endpoints: b=172.30.150.138:6789,c=172.30.42.121:6789,a=172.30.150.9:6789 2022-02-23 22:49:19.222219 I | cephclient: writing config file /var/lib/rook/openshift-storage/openshift-storage.config 2022-02-23 22:49:19.222386 I | cephclient: generated admin config in /var/lib/rook/openshift-storage 2022-02-23 22:49:19.222470 D | cephclient: config file @ /etc/ceph/ceph.conf: [global] fsid = 0149cef0-9902-4336-b08a-c825e0b56687 mon initial members = b c a mon host = [v2:172.30.150.138:3300,v1:172.30.150.138:6789],[v2:172.30.42.121:3300,v1:172.30.42.121:6789],[v2:172.30.150.9:3300,v1:172.30.150.9:6789] bdev_flock_retry = 20 mon_osd_full_ratio = .85 mon_osd_backfillfull_ratio = .8 mon_osd_nearfull_ratio = .75 mon_max_pg_per_osd = 600 mon_pg_warn_max_object_skew = 0 mon_data_avail_warn = 15 [osd] osd_memory_target_cgroup_limit_ratio = 0.8 [client.admin] keyring = /var/lib/rook/openshift-storage/client.admin.keyring 2022-02-23 22:49:19.222489 D | exec: Running command: ceph osd dump --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:19.580090 I | cephosd: validating status of osd.2 2022-02-23 22:49:19.580116 I | cephosd: osd.2 is marked 'DOWN' 2022-02-23 22:49:19.580131 D | exec: Running command: ceph osd safe-to-destroy 2 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:19.928676 I | cephosd: osd.2 is NOT be ok to destroy but force removal is enabled so proceeding with removal 2022-02-23 22:49:19.928727 D | exec: Running command: ceph osd find 2 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:20.256448 I | cephosd: marking osd.2 out 2022-02-23 22:49:20.256508 D | exec: Running command: ceph osd out osd.2 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:20.756771 I | cephosd: removing the OSD deployment "rook-ceph-osd-2" 2022-02-23 22:49:20.756805 D | op-k8sutil: removing rook-ceph-osd-2 deployment if it exists 2022-02-23 22:49:20.756809 I | op-k8sutil: removing deployment rook-ceph-osd-2 if it exists 2022-02-23 22:49:20.769692 I | op-k8sutil: Removed deployment rook-ceph-osd-2 2022-02-23 22:49:20.779769 I | op-k8sutil: "rook-ceph-osd-2" still found. waiting... 2022-02-23 22:49:22.788297 I | op-k8sutil: confirmed rook-ceph-osd-2 does not exist 2022-02-23 22:49:22.798924 I | cephosd: removing the osd prepare job "rook-ceph-osd-prepare-b59c39dd57cd891848ca9de1e242595b" 2022-02-23 22:49:22.809935 I | cephosd: removing the OSD PVC "ocs-deviceset-localblock-0-data-2qqh27" 2022-02-23 22:49:22.826913 I | cephosd: destroying osd.2 2022-02-23 22:49:22.826951 D | exec: Running command: ceph osd destroy osd.2 --yes-i-really-mean-it --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:23.200485 I | cephosd: removing osd.2 from ceph 2022-02-23 22:49:23.200518 D | exec: Running command: ceph osd crush rm compute-0 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:23.549683 E | cephosd: failed to remove CRUSH host "compute-0". exit status 39 2022-02-23 22:49:23.549715 D | exec: Running command: ceph crash ls --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2022-02-23 22:49:23.893552 I | cephosd: no ceph crash to silence 2022-02-23 22:49:23.893583 I | cephosd: completed removal of OSD 2 4.Check Ceph status: sh-4.4$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.78149 root default -7 0.39075 host compute-0 2 hdd 0.09769 osd.2 destroyed 0 1.00000 3 hdd 0.09769 osd.3 destroyed 0 1.00000 4 hdd 0.09769 osd.4 up 1.00000 1.00000 5 hdd 0.09769 osd.5 up 1.00000 1.00000 -5 0.19537 host compute-1 0 hdd 0.09769 osd.0 up 1.00000 1.00000 6 hdd 0.09769 osd.6 up 1.00000 1.00000 -3 0.19537 host compute-2 1 hdd 0.09769 osd.1 up 1.00000 1.00000 7 hdd 0.09769 osd.7 up 1.00000 1.00000 sh-4.4$ ceph status cluster: id: 0149cef0-9902-4336-b08a-c825e0b56687 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 3d) mgr: a(active, since 3d) mds: 1/1 daemons up, 1 hot standby osd: 8 osds: 6 up (since 3d), 6 in (since 3d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 273 pgs objects: 27.22k objects, 104 GiB usage: 317 GiB used, 283 GiB / 600 GiB avail pgs: 273 active+clean io: client: 1.2 KiB/s rd, 9.7 KiB/s wr, 2 op/s rd, 1 op/s wr sh-4.4$ ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 2 hdd 0.09769 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 destroyed 3 hdd 0.09769 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 destroyed 4 hdd 0.09769 1.00000 100 GiB 50 GiB 50 GiB 32 KiB 797 MiB 50 GiB 50.31 0.95 124 up 5 hdd 0.09769 1.00000 100 GiB 55 GiB 55 GiB 92 KiB 838 MiB 45 GiB 55.41 1.05 149 up 0 hdd 0.09769 1.00000 100 GiB 50 GiB 50 GiB 78 KiB 994 MiB 50 GiB 50.50 0.96 135 up 6 hdd 0.09769 1.00000 100 GiB 55 GiB 55 GiB 119 KiB 665 MiB 45 GiB 55.25 1.05 138 up 1 hdd 0.09769 1.00000 100 GiB 51 GiB 50 GiB 130 KiB 728 MiB 49 GiB 50.87 0.96 138 up 7 hdd 0.09769 1.00000 100 GiB 55 GiB 54 GiB 7 KiB 949 MiB 45 GiB 54.89 1.04 135 up TOTAL 600 GiB 317 GiB 312 GiB 460 KiB 4.9 GiB 283 GiB 52.87 MIN/MAX VAR: 0.95/1.05 STDDEV: 2.32 Actual results: OLD OSD moved to "destroyed" state. Expected results: OLD OSD removed Additional info: MG: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2059027.tar.gz