Description of problem (please be detailed as possible and provide log snippests): If RBD provisioner leader pod is deleted when a thick PVC provision is progressing, the resulting volume will not be thick provisioned. This was tested with a 20 GiB PVC. The PVC reached Bound state. But the used size of the rbd image is 984 MiB. This 984 MiB used size could be the thick provisioned size before deleting the csi-rbdplugin-provisioner leader pod. Provisioned size is 20GiB. Test case error showing the du output of the image and describe output of PV: E AssertionError: PVC pvc-test-f854452cb4a64712beb2ce4a0ffd691 is not thick provisioned. Rbd image csi-vol-74a77a0a-b7c1-11eb-9543-0a580a830148 expected used size: 20GiB. Actual used size 984MiB. E Rbd du out: NAME PROVISIONED USED csi-vol-74a77a0a-b7c1-11eb-9543-0a580a830148 20 GiB 984 MiB E PV describe : E Name: pvc-487fb831-03b5-4bc6-8f67-72b5eee94a7c E Labels: <none> E Annotations: pv.kubernetes.io/provisioned-by: openshift-storage.rbd.csi.ceph.com E Finalizers: [kubernetes.io/pv-protection] E StorageClass: ocs-storagecluster-ceph-rbd-thick E Status: Bound E Claim: namespace-test-d57cab19c5f14cd5a75ae33eb/pvc-test-f854452cb4a64712beb2ce4a0ffd691 E Reclaim Policy: Delete E Access Modes: RWO E VolumeMode: Filesystem E Capacity: 20Gi E Node Affinity: <none> E Message: E Source: E Type: CSI (a Container Storage Interface (CSI) volume source) E Driver: openshift-storage.rbd.csi.ceph.com E FSType: ext4 E VolumeHandle: 0001-0011-openshift-storage-0000000000000001-74a77a0a-b7c1-11eb-9543-0a580a830148 E ReadOnly: false E VolumeAttributes: clusterID=openshift-storage E csi.storage.k8s.io/pv/name=pvc-487fb831-03b5-4bc6-8f67-72b5eee94a7c E csi.storage.k8s.io/pvc/name=pvc-test-f854452cb4a64712beb2ce4a0ffd691 E csi.storage.k8s.io/pvc/namespace=namespace-test-d57cab19c5f14cd5a75ae33eb E imageFeatures=layering E imageFormat=2 E imageName=csi-vol-74a77a0a-b7c1-11eb-9543-0a580a830148 E journalPool=ocs-storagecluster-cephblockpool E pool=ocs-storagecluster-cephblockpool E storage.kubernetes.io/csiProvisionerIdentity=1621299316893-8081-openshift-storage.rbd.csi.ceph.com E thickProvision=true E Events: <none> E E assert '984MiB' == '20GiB' E - 984MiB E + 20GiB tests/manage/pv_services/test_delete_provisioner_pod_while_thick_provisioning.py:90: AssertionError Test case: tests/manage/pv_services/test_delete_provisioner_pod_while_thick_provisioning.py::TestDeleteProvisionerPodWhileThickProvisioning::test_delete_provisioner_pod_while_thick_provisioning ocs-ci PR: https://github.com/red-hat-storage/ocs-ci/pull/4300 OCS and OCP must-gather logs : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-17may/jijoy-17may_20210517T110547/logs/failed_testcase_ocs_logs_1621332632/test_delete_provisioner_pod_while_thick_provisioning_ocs_logs/ Test case logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-17may/jijoy-17may_20210517T110547/logs/ocs-ci-logs-1621332632/by_outcome/failed/tests/manage/pv_services/test_delete_provisioner_pod_while_thick_provisioning.py/TestDeleteProvisionerPodWhileThickProvisioning/test_delete_provisioner_pod_while_thick_provisioning =========================================================================== Version of all relevant components (if applicable): OCS 4.8.0-394.ci OCP 4.8.0-0.nightly-2021-05-15-141455 Ceph 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable) rook_csi_provisioner ose-csi-external-provisioner@sha256:0d1cab421c433c213d37043dd0dbaa6a2942ccf1d21d35afc32e35ce8216ddec Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, the volume will not be thick provisioned. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: RBD thick provisioning is a new feature in OCS 4.8 ======================================================================= Steps to Reproduce: 1. Start creating a PVC of size 20GiB. Use "ocs-storagecluster-ceph-rbd-thick" storage class in which thick provision is enabled. 2. When step 1 is progressing, delete the csi-rbdplugin-provisioner leader pod. 3. Wait for the new csi-rbdplugin-provisioner pod to spin up. 4. Wait for the PVC to reach Bound state and check the used size of the corresponding rbd image. Command: rbd du -p ocs-storagecluster-cephblockpool csi-vol-74a77a0a-b7c1-11eb-9543-0a580a830148 where "ocs-storagecluster-cephblockpool" is the pool name and "csi-vol-74a77a0a-b7c1-11eb-9543-0a580a830148" is the image name OR Run the ocs-ci test case tests/manage/pv_services/test_delete_provisioner_pod_while_thick_provisioning.py::TestDeleteProvisionerPodWhileThickProvisioning::test_delete_provisioner_pod_while_thick_provisioning ==================================================================== Actual results: The used size of the rbd image is not equal to the provisioned size. Expected results: The used size of the rbd image should be equal to the provisioned size. Additional info:
In case thick-provisioning is aborted/restarted, the metadata of the RBD image will not contain the thick-provisioning key/value. A CreateVolume request will have the thick-provision option set, so this can be checked with the (missing) value in the RBD image metadata.
Upstream change posted at https://github.com/ceph/ceph-csi/pull/2101
https://github.com/openshift/ceph-csi/pull/52 has been merged and will be included in the next build.
Verified using the test case tests/manage/pv_services/test_delete_provisioner_pod_while_thick_provisioning.py::TestDeleteProvisionerPodWhileThickProvisioning::test_delete_provisioner_pod_while_thick_provisioning, PR https://github.com/red-hat-storage/ocs-ci/pull/4300 Test case logs : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr4300-b822/jnk-pr4300-b822_20210607T052437/logs/ rbd du output from test case log: 2021-06-07 07:28:07,452 - MainThread - INFO - ocs_ci.utility.utils.exec_cmd.486 - Executing command: oc -n openshift-storage rsh rook-ceph-tools-56dc89c8c9-dw2tz rbd du -p ocs-storagecluster-cephblockpool csi-vol-8eb3c279-c761-11eb-a3b3-0a580a81020e 2021-06-07 07:28:18,042 - MainThread - DEBUG - ocs_ci.utility.utils.exec_cmd.499 - Command stdout: NAME PROVISIONED USED csi-vol-8eb3c279-c761-11eb-a3b3-0a580a81020e 20 GiB 20 GiB Verified in version: OCS operator v4.8.0-409.ci Cluster Version 4.8.0-0.nightly-2021-06-06-164529 Ceph Version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable) rook_csi_provisioner ose-csi-external-provisioner@sha256:611a895fdc5c9d3b1561cfa0eb01d67349985c2a2909f00c3b010a693667ff8a
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003