Tested in version: ocs-operator.v4.8.0-416.ci 4.8.0-0.nightly-2021-06-13-101614 ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable) This is tested on a cluster where PVC snapshot, clone tests were executed. There were images present in trash so I did not try to create new snapshots or clone. List of images present. sh-4.4# rbd trash ls ocs-storagecluster-cephblockpool 1077f3e0f019f csi-vol-7701031e-cd9f-11eb-b551-0a580a800210 1077f6d9c4285 csi-vol-00443f03-cd9f-11eb-b551-0a580a800210 1077faffae2a0 csi-snap-e7ba191e-cda2-11eb-b551-0a580a800210 1077fff5a887a csi-vol-167a2441-cd9f-11eb-b551-0a580a800210 abfd236218df csi-vol-a2cb5579-cf64-11eb-a93e-0a580a810228 abfd31767421 csi-vol-38ba2b8f-cf62-11eb-a93e-0a580a810228 abfd345c897c csi-vol-3c01aee8-cf61-11eb-a93e-0a580a810228 abfd4b4f2bae csi-snap-9da28c5b-cf61-11eb-a93e-0a580a810228 abfd507d451d csi-snap-c441ed75-cf63-11eb-a93e-0a580a810228 abfd585e0a55 csi-vol-3acc1595-cf61-11eb-a93e-0a580a810228 abfd5fe06684 csi-snap-49525f2e-cf64-11eb-a93e-0a580a810228 abfd644c6d8 csi-snap-ec350e12-cf63-11eb-a93e-0a580a810228 abfd6884376c csi-vol-3b552e03-cf61-11eb-a93e-0a580a810228 abfd6d2b3bc8 csi-vol-5450bc11-cf67-11eb-a93e-0a580a810228 abfd7013d7bc csi-snap-ce854113-cf63-11eb-a93e-0a580a810228 abfd75609a5b csi-snap-a4eae183-cf61-11eb-a93e-0a580a810228 abfd81b3417c csi-snap-370ce8ab-cf64-11eb-a93e-0a580a810228 abfd83fa5861 csi-snap-d8875f1f-cf63-11eb-a93e-0a580a810228 abfd8df3b9a4 csi-snap-0075cb86-cf64-11eb-a93e-0a580a810228 abfd93d0ca18 csi-snap-1b37f24d-cf64-11eb-a93e-0a580a810228 abfdaebfcb0 csi-snap-ad188c21-cf61-11eb-a93e-0a580a810228 abfdc3f5f4b4 csi-vol-067f0a21-cf66-11eb-a93e-0a580a810228 abfdc8ea549a csi-vol-a9c81ce0-cf66-11eb-a93e-0a580a810228 abfddaf781b1 csi-snap-40dc0ec7-cf64-11eb-a93e-0a580a810228 abfddfd9ace0 csi-snap-245d8cfe-cf64-11eb-a93e-0a580a810228 abfdf78bd3e9 csi-snap-f5a36265-cf63-11eb-a93e-0a580a810228 abfdf82519d5 csi-snap-130b4793-cf64-11eb-a93e-0a580a810228 sh-4.4# sh-4.4# rbd trash purge ocs-storagecluster-cephblockpool Removing images: 11% complete...2021-06-17 12:34:17.594 7fb0f77fe700 -1 librbd::image::PreRemoveRequest: 0x7fb0f0005ea0 check_image_watchers: image has watchers - not removing Removing images: 50% complete...2021-06-17 12:51:51.384 7fb0f77fe700 -1 librbd::image::PreRemoveRequest: 0x7fb0f00053c0 check_image_watchers: image has watchers - not removing Removing images: 84% complete...failed. rbd: some expired images could not be removed Ensure that they are closed/unmapped, do not have snapshots (including trashed snapshots with linked clones), are not in a group and were moved to the trash successfully. sh-4.4# Failed after 84% completion. Images present after the failure of 'rbd trash purge' command. sh-4.4# rbd trash ls ocs-storagecluster-cephblockpool 1077f3e0f019f csi-vol-7701031e-cd9f-11eb-b551-0a580a800210 1077f6d9c4285 csi-vol-00443f03-cd9f-11eb-b551-0a580a800210 1077faffae2a0 csi-snap-e7ba191e-cda2-11eb-b551-0a580a800210 1077fff5a887a csi-vol-167a2441-cd9f-11eb-b551-0a580a800210 sh-4.4# Ceph status sh-4.4# ceph status cluster: id: eef8c304-8405-4383-ad40-14d52ef135aa health: HEALTH_OK services: mon: 3 daemons, quorum b,c,d (age 2d) mgr: a(active, since 2d) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-b=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 2d), 3 in (since 2d) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 176 pgs objects: 40.06k objects, 134 GiB usage: 372 GiB used, 1.1 TiB / 1.5 TiB avail pgs: 176 active+clean io: client: 13 MiB/s rd, 167 MiB/s wr, 3.24k op/s rd, 4.08k op/s wr sh-4.4# Hi Scott/Mudti, Marking this bug as failed_Qa because the 'rbd trash purge' command failed. Please let me know if I missed anything in the verification procedure.
Logs collected after testing (Comment #7) http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-lso-jun14/jijoy-lso-jun14_20210614T080510/logs/testcases_1623940737/
Madhu, PTAl https://bugzilla.redhat.com/show_bug.cgi?id=1964373#c9
>Removing images: 11% complete...2021-06-17 12:34:17.594 7fb0f77fe700 -1 librbd::image::PreRemoveRequest: 0x7fb0f0005ea0 check_image_watchers: image has watchers - not removing Removing images: 50% complete...2021-06-17 12:51:51.384 7fb0f77fe700 -1 librbd::image::PreRemoveRequest: 0x7fb0f00053c0 check_image_watchers: image has watchers - not removing Removing images: 84% complete...failed. rbd: some expired images could not be removed Ensure that they are closed/unmapped, do not have snapshots (including trashed snapshots with linked clones), are not in a group, and were moved to the trash successfully. For this one, I need to check. @Jijlu can you please provide the steps done to verify this one? * Are there any kubernetes PVC, snapshots are present when you are doing trash purge? @jijly please check Ilya comment on #c9
Madhu, Snapshots were present when rbd trash purge command was issued. Parent PVC was not present for some of the snapshots. As mentioned in comment #16, as there were snapshots (with and without parent PVC) and PVCs clones already present , I did not create any new PVC/Snapshots for this test. So some rbd images were left behind, which is expected. Ceph status was also clean. But the output of rbd trash purge command(comment #7) shows that the command failed. The rbd purge command should show success after skipping the relevant images. Isn't this expected ?
(In reply to Jilju Joy from comment #13) > Madhu, > > Snapshots were present when rbd trash purge command was issued. Parent PVC > was not present for some of the snapshots. > As mentioned in comment #16, as there were snapshots (with and without As mentioned in comment #7
Jilju, did you try deleting the images one by one after the failure as mentioned by Ilya in comment #9. Ilya, I have one question. The error message says that the image won't be deleted if it is expired. What is the meaning of expired here. Madhu, I agree with Jilju here because an image in trash which is a parent of a clone/snapshot can remain in trash till all the dependants for that image are deleted. This is the expected behavior, I think we paid this cost to have the parity with Kubernetes expectations.
(In reply to Mudit Agarwal from comment #15) > Jilju, did you try deleting the images one by one after the failure as > mentioned by Ilya in comment #9. > Mudit, I did not try that. The cluster was destroyed before I noticed Ilya's comment. I was excepting the rbd trash purge command to succeed after skipping the images which still have watchers.
Looks like expected as discussed in the above comments. Moving it back to ON_QA after an offline discussion with Madhu and Jilju.
Verified in version: ocs-operator.v4.8.0-422.ci OCP 4.8.0-0.nightly-2021-06-19-005119 ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable) Steps performed: List the images present in trash sh-4.4# rbd trash ls ocs-storagecluster-cephblockpool 606527837db5 csi-vol-2515acb6-d344-11eb-93d0-0a580a80020f 6065338d897 csi-vol-d2a241c0-d344-11eb-93d0-0a580a80020f 60653a65abb csi-vol-15505254-d344-11eb-93d0-0a580a80020f 606550cf1430 csi-snap-cf581a31-d344-11eb-93d0-0a580a80020f 60656fd04246 csi-snap-e8a2e21d-d344-11eb-93d0-0a580a80020f 606570a61bd csi-vol-f5554124-d343-11eb-93d0-0a580a80020f 6065ab0ab2b2 csi-vol-ec061a6a-d344-11eb-93d0-0a580a80020f 6065b3d36baf csi-vol-0ac34e57-d344-11eb-93d0-0a580a80020f sh-4.4# Try to delete all images using rbd trash purge command. sh-4.4# rbd trash purge ocs-storagecluster-cephblockpool Removing images: 50% complete...failed. rbd: some expired images could not be removed Ensure that they are closed/unmapped, do not have snapshots (including trashed snapshots with linked clones), are not in a group and were moved to the trash successfully. sh-4.4# rbd trash purge did not hung but completed after skipping the relevant images. List of the images remaining in trash sh-4.4# rbd trash ls ocs-storagecluster-cephblockpool 606527837db5 csi-vol-2515acb6-d344-11eb-93d0-0a580a80020f 60653a65abb csi-vol-15505254-d344-11eb-93d0-0a580a80020f 60656fd04246 csi-snap-e8a2e21d-d344-11eb-93d0-0a580a80020f 6065ab0ab2b2 csi-vol-ec061a6a-d344-11eb-93d0-0a580a80020f sh-4.4# Check ceph status sh-4.4# ceph status cluster: id: 339aeda9-4b2a-4733-b085-d33fa584ca6f health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 18h) mgr: a(active, since 18h) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 18h), 3 in (since 18h) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 176 pgs objects: 11.57k objects, 39 GiB usage: 118 GiB used, 2.9 TiB / 3 TiB avail pgs: 176 active+clean io: client: 2.8 KiB/s rd, 180 KiB/s wr, 3 op/s rd, 5 op/s wr sh-4.4# Try to delete each of the remaining images. sh-4.4# rbd trash rm 606527837db5 -p ocs-storagecluster-cephblockpool rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed. Removing image: 0% complete...failed. sh-4.4# sh-4.4# rbd trash rm 60653a65abb -p ocs-storagecluster-cephblockpool rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed. Removing image: 0% complete...failed. sh-4.4# sh-4.4# rbd trash rm 60656fd04246 -p ocs-storagecluster-cephblockpool rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed. Removing image: 0% complete...failed. sh-4.4# sh-4.4# sh-4.4# rbd trash rm 6065ab0ab2b2 -p ocs-storagecluster-cephblockpool rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed. Removing image: 0% complete...failed. sh-4.4# sh-4.4# Attempt to delete the remaining images from trash has failed. This ensures that the rbd trash purge command did not skip any image that can be deleted. Check ceph status sh-4.4# ceph status cluster: id: 339aeda9-4b2a-4733-b085-d33fa584ca6f health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 18h) mgr: a(active, since 18h) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 3 osds: 3 up (since 18h), 3 in (since 18h) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 176 pgs objects: 11.58k objects, 39 GiB usage: 118 GiB used, 2.9 TiB / 3 TiB avail pgs: 176 active+clean io: client: 5.8 KiB/s rd, 8.9 KiB/s wr, 7 op/s rd, 5 op/s wr sh-4.4#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003