Moving to ON_QA just to make sure the state follow the parent bug state
A bug shouldn't be ON_QA without all ACKs. :-)
And we should also ensure that the Ceph container with the fix for this BZ is available for OCS QE to test
Humble, we can also track the ceph csi work with this BZ for which Yug has merged a PR. Moving it back to POST, please mark it back to ON_QA once this PR (https://github.com/ceph/ceph-csi/pull/1458) is merged to Downstream 4.6
Considering we have this in latest Downstream OCS build, I am moving to ON_QA.
Again, as Michael said in https://bugzilla.redhat.com/show_bug.cgi?id=1854501#c3 we should not mak it to ON_QA till we have all tha acks (qa_ack is missing in this case) Also, can you paste the link for D/S PR here.
https://github.com/openshift/ceph-csi/commit/b32569bc47d2f89a5abe9b309caf8446fbc47d4f -> I updated the d/s branch last week soon after the upstream PR merge. Also spoke to Jilju about getting it qualified without delay to make sure we dont have any hiccups.
Apart from verifying parent PVC deletion which contain snapshots, do we need to perform any other validation ?
Tested parent PVC deletion when snapshots are present. 5 snapshots were present when the parent PVC was deleted. PVC got deleted but the PV remained in released state. > raise TimeoutError(msg) E TimeoutError: Timeout when waiting for pvc-c6e8b731-f98b-457a-8775-7f294ddcefb4 to delete. Describe output: Name: pvc-c6e8b731-f98b-457a-8775-7f294ddcefb4 E Labels: <none> E Annotations: pv.kubernetes.io/provisioned-by: openshift-storage.cephfs.csi.ceph.com E Finalizers: [kubernetes.io/pv-protection] E StorageClass: ocs-storagecluster-cephfs E Status: Released E Claim: namespace-test-e6e9d09fe8c64bd8a3eecbaf8738d9ab/pvc-test-d737f9a2ad5c4c30ab4c3add2a6eb630 E Reclaim Policy: Delete E Access Modes: RWO E VolumeMode: Filesystem E Capacity: 10Gi E Node Affinity: <none> E Message: E Source: E Type: CSI (a Container Storage Interface (CSI) volume source) E Driver: openshift-storage.cephfs.csi.ceph.com E FSType: ext4 E VolumeHandle: 0001-0011-openshift-storage-0000000000000001-845cfd54-f918-11ea-b36b-0a580a83000f E ReadOnly: false E VolumeAttributes: clusterID=openshift-storage E fsName=ocs-storagecluster-cephfilesystem E storage.kubernetes.io/csiProvisionerIdentity=1600322755225-8081-openshift-storage.cephfs.csi.ceph.com E subvolumeName=csi-vol-845cfd54-f918-11ea-b36b-0a580a83000f E Events: E Type Reason Age From Message E ---- ------ ---- ---- ------- E Warning VolumeFailedDelete 19s (x7 over 60s) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bf8fbdbdc-4nkkl_40e4b2b6-dc79-44e3-b282-490d70377b1a rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-845cfd54-f918-11ea-b36b-0a580a83000f --group_name csi -m 172.30.64.112:6789,172.30.128.7:6789,172.30.117.216:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***] ocs_ci/ocs/ocp.py:655: TimeoutError Tested in version: OCS operator v4.6.0-88.ci Cluster Version 4.6.0-0.nightly-2020-09-17-004654 Ceph Version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable) rook_ceph rook-ceph@sha256:43d293e1110cf657086ec8c85ed4b8561803f1df6ccbbefa6b1544b50548bbfc rook_csi_attacher ose-csi-external-attacher@sha256:eb7596df3ae25878c69d0ebb187a22fe29ce493457402fa9560a4f32efd5fd09 rook_csi_ceph cephcsi@sha256:99a92c29dd4fe94db8d1a8af0c375ba2cc0994a1f0a72d7833de5cf1f3cf6152 rook_csi_provisioner ose-csi-external-provisioner-rhel7@sha256:0f35049599d8cc80f3a611fd3d02965317284a0151e98e0177e182fe733ee47c rook_csi_snapshotter ose-csi-external-snapshotter-rhel7@sha256:bd81f802e9abc7869f6967828a304e84fa6a34f62ccbe96be3fdd8bf8eb143cb Automation run result : https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/12494/testReport/ Must-gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-sep17/jijoy-sep17_20200917T052511/logs/failed_testcase_ocs_logs_1600369381/test_snapshot_at_different_usage_level_ocs_logs/ Test case logs : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-sep17/jijoy-sep17_20200917T052511/logs/ocs-ci-logs-1600369381/tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py/TestSnapshotAtDifferentPvcUsageLevel/test_snapshot_at_different_usage_level/ ocs-ci test case : tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py This test case was executed from https://github.com/red-hat-storage/ocs-ci/pull/2965
Yug, PTAL
Thanks Yug, this means that the used ceph version doesn't have the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1848494 If I remember correctly, the fix is supposed to land in OCS with RHCS 4.1z2 but we are not yet pointing to that. Shyam, please correct me if I am wrong.
Fine, from c#15 and followup comments, Its clear that, we are missing the "--retain-snaps" in the build which is causing this behaviour. We have to see how to get the fixed RHCS build here and from then it should work as expected. Boris, do we know when the latest build of RHCS 4.1.z2 will be consumed by the OCS builds ? especially we are looking for the build which got https://bugzilla.redhat.com/show_bug.cgi?id=1848494 fix in it.
Once we have OCS4.6 pointing to RHCS 4.1.z2 we should not see this issue.
Moving it to ON_QA, latest 4.6 should have the reqired RHCS image.
The below ocs-ci test covers cephfs parent PVC deletion when snaphots are present. The snapshots will be restored to new PVCs after deleting the parent PVC. tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level Test case passed in version: OCS operator v4.6.0-113.ci Ceph Version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) Cluster Version 4.6.0-0.nightly-2020-10-08-210814 rook_csi_ceph cephcsi@sha256:3b2fff211845eab398d66262a4c47eb5eadbcd982de80387aa47dd23f6572b22 rook_csi_snapshotter ose-csi-external-snapshotter@sha256:0359271dc35325385c9be9a5b353cbbc870998aa17d3542ab920acc3a9d59273 Hi Yug, Should I test some other scenario before marking this bug as verified ?
Thanks Jilju, The following steps should verify the snapshot retention feature: - Create PVC and app - Create Snapshot - Delete the PVC and app - Create a PVC from snapshot - Delete the PVC and app - Delete the snapshot Since the test validates these, I think it should be fine.
(In reply to Yug Gupta from comment #28) > Thanks Jilju, > > The following steps should verify the snapshot retention feature: > - Create PVC and app > - Create Snapshot > - Delete the PVC and app > - Create a PVC from snapshot > - Delete the PVC and app > - Delete the snapshot > > Since the test validates these, I think it should be fine. Thanks Yug.
Adding AutomationTriaged keyword. Test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605