This is happening inside new go-ceph function in 4.7 SubVolumeinfo
Able to repro the issue. RCA: When you delete a subvolume with --retain-snapshots command and then run the subvolume info" on the same volume it will return only very limited attributes. >> Normal subvolume: sh-4.2# ceph fs subvolume info myfs m1 { "atime": "2021-01-29 12:29:28", "bytes_pcent": "undefined", "bytes_quota": "infinite", "bytes_used": 0, "created_at": "2021-01-29 12:29:28", "ctime": "2021-01-29 12:29:28", "data_pool": "myfs-data0", "features": [ "snapshot-clone", "snapshot-autoprotect", "snapshot-retention" ], "gid": 0, "mode": 16877, "mon_addrs": [ "10.101.86.141:6789" ], "mtime": "2021-01-29 12:29:28", "path": "/volumes/_nogroup/m1/972f9188-23be-46fc-a2c1-1355f2e566f0", "pool_namespace": "", "state": "complete", "type": "subvolume", "uid": 0 } >> Deleted Subvolume with retained snapshots sh-4.2# ceph fs subvolume info myfs csi-vol-6ed1eaac-622c-11eb-8f93-0242ac110010 csi { "features": [ "snapshot-clone", "snapshot-autoprotect", "snapshot-retention" ], "mon_addrs": [ "10.101.86.141:6789" ], "state": "snapshot-retained", "type": "subvolume" } But getSubVolumeInfo() function in CephCSI expects all the attributes always and hence fails when it couldn't find bytes_quota attribute. Steps to reproduce: 1. Create a PVC 2. Create a snapshot inside it 3. Delete the PVC 4. Delete the snapshot or create a clone from the snapshot (it will fail in both cases)
Verified in version: OCS 4.7.0-250.ci OCP 4.7.0-0.nightly-2021-02-05-005950 Ceph 14.2.11-112.el8cp rook_csi_ceph cephcsi@sha256:c905ff5a45d888829d0d90b2e4eb3e107068e14606fb4e775314500f4358c423 rook_csi_provisioner ose-csi-external-provisioner@sha256:0fdb8b5c8fc327f142840c24f618c156009d42257bd20405fbe5f427ab2f3e26 rook_csi_snapshotter ose-csi-external-snapshotter@sha256:30a261dfafa3d7ff462a2b821aa8e00e15e5662dd03d474a2d83a35135a3d4db Verified by the test case: tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level Test result: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/249/testReport/ Run id: 1612519390 Also verified on a cluster upgraded from OCS version 4.6.2 to 4.7.0-250.ci. OCP version - 4.7.0-0.nightly-2021-02-04-075559 Verified by the test case: tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level Build - https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/218/ Run id: 1612456597
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041