1854501 – [Tracker-rhcs bug 1848494 ]pybind/mgr/volumes: Add the ability to keep snapshots of subvolumes independent of the source subvolume

Bug 1854501 - [Tracker-rhcs bug 1848494 ]pybind/mgr/volumes: Add the ability to keep snapshots of subvolumes independent of the source subvolume

Summary: [Tracker-rhcs bug 1848494 ]pybind/mgr/volumes: Add the ability to keep snapsh...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	csi-driver
Sub Component:
Version:	4.6
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.6.0
Assignee:	Yug Gupta
QA Contact:	Jilju Joy
Docs Contact:
URL:
Whiteboard:
Depends On:	1848494 1859464
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-07 15:23 UTC by Humble Chirammal
Modified:	2020-12-17 06:23 UTC (History)
CC List:	15 users (show)
Fixed In Version:	4.6.0-102.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:	1848494
Environment:
Last Closed:	2020-12-17 06:22:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-csi pull 1458	0	None	closed	cephfs: Add support for snapshot retention	2020-12-21 08:05:15 UTC
Red Hat Product Errata	RHSA-2020:5605	0	None	None	None	2020-12-17 06:23:46 UTC

Comment 2 Humble Chirammal 2020-08-25 15:19:02 UTC

Moving to ON_QA just to make sure the state follow the parent bug state

Comment 3 Michael Adam 2020-08-25 15:48:17 UTC

A bug shouldn't be ON_QA without all ACKs. :-)

Comment 4 Rejy M Cyriac 2020-08-26 09:01:25 UTC

And we should also ensure that the Ceph container with the fix for this BZ is available for OCS QE to test

Comment 5 Mudit Agarwal 2020-09-16 08:07:44 UTC

Humble, we can also track the ceph csi work with this BZ for which Yug has merged a PR.  
Moving it back to POST, please mark it back to ON_QA once this PR (https://github.com/ceph/ceph-csi/pull/1458) is merged to Downstream 4.6

Comment 6 Humble Chirammal 2020-09-16 10:37:19 UTC

Considering we have this in latest Downstream OCS build, I am moving to ON_QA.

Comment 7 Mudit Agarwal 2020-09-16 10:58:29 UTC

Again, as Michael said in https://bugzilla.redhat.com/show_bug.cgi?id=1854501#c3 we should not mak it to ON_QA till we have all tha acks (qa_ack is missing in this case)

Also, can you paste the link for D/S PR here.

Comment 8 Humble Chirammal 2020-09-16 11:02:48 UTC

https://github.com/openshift/ceph-csi/commit/b32569bc47d2f89a5abe9b309caf8446fbc47d4f -> I updated the d/s branch last week soon after the upstream PR merge. Also spoke to Jilju about getting it qualified without delay to make sure we dont have any hiccups.

Comment 9 Jilju Joy 2020-09-16 14:34:35 UTC

Apart from verifying parent PVC deletion which contain snapshots, do we need to perform any other validation ?

Comment 10 Jilju Joy 2020-09-17 19:58:33 UTC

Tested parent PVC deletion when snapshots are present. 5 snapshots were present when the parent PVC was deleted. PVC got deleted but the PV remained in released state.


>               raise TimeoutError(msg)
E               TimeoutError: Timeout when waiting for pvc-c6e8b731-f98b-457a-8775-7f294ddcefb4 to delete. Describe output: Name:            pvc-c6e8b731-f98b-457a-8775-7f294ddcefb4
E               Labels:          <none>
E               Annotations:     pv.kubernetes.io/provisioned-by: openshift-storage.cephfs.csi.ceph.com
E               Finalizers:      [kubernetes.io/pv-protection]
E               StorageClass:    ocs-storagecluster-cephfs
E               Status:          Released
E               Claim:           namespace-test-e6e9d09fe8c64bd8a3eecbaf8738d9ab/pvc-test-d737f9a2ad5c4c30ab4c3add2a6eb630
E               Reclaim Policy:  Delete
E               Access Modes:    RWO
E               VolumeMode:      Filesystem
E               Capacity:        10Gi
E               Node Affinity:   <none>
E               Message:         
E               Source:
E                   Type:              CSI (a Container Storage Interface (CSI) volume source)
E                   Driver:            openshift-storage.cephfs.csi.ceph.com
E                   FSType:            ext4
E                   VolumeHandle:      0001-0011-openshift-storage-0000000000000001-845cfd54-f918-11ea-b36b-0a580a83000f
E                   ReadOnly:          false
E                   VolumeAttributes:      clusterID=openshift-storage
E                                          fsName=ocs-storagecluster-cephfilesystem
E                                          storage.kubernetes.io/csiProvisionerIdentity=1600322755225-8081-openshift-storage.cephfs.csi.ceph.com
E                                          subvolumeName=csi-vol-845cfd54-f918-11ea-b36b-0a580a83000f
E               Events:
E                 Type     Reason              Age                From                                                                                                                      Message
E                 ----     ------              ----               ----                                                                                                                      -------
E                 Warning  VolumeFailedDelete  19s (x7 over 60s)  openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bf8fbdbdc-4nkkl_40e4b2b6-dc79-44e3-b282-490d70377b1a  rpc error: code = Internal desc = an error (exit status 39) occurred while running ceph args: [fs subvolume rm ocs-storagecluster-cephfilesystem csi-vol-845cfd54-f918-11ea-b36b-0a580a83000f --group_name csi -m 172.30.64.112:6789,172.30.128.7:6789,172.30.117.216:6789 -c /etc/ceph/ceph.conf -n client.csi-cephfs-provisioner --keyfile=***stripped***]

ocs_ci/ocs/ocp.py:655: TimeoutError



Tested in version: 
OCS operator 	v4.6.0-88.ci
Cluster Version 	4.6.0-0.nightly-2020-09-17-004654
Ceph Version 	14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)
rook_ceph 	rook-ceph@sha256:43d293e1110cf657086ec8c85ed4b8561803f1df6ccbbefa6b1544b50548bbfc
rook_csi_attacher 	ose-csi-external-attacher@sha256:eb7596df3ae25878c69d0ebb187a22fe29ce493457402fa9560a4f32efd5fd09
rook_csi_ceph 	cephcsi@sha256:99a92c29dd4fe94db8d1a8af0c375ba2cc0994a1f0a72d7833de5cf1f3cf6152
rook_csi_provisioner 	ose-csi-external-provisioner-rhel7@sha256:0f35049599d8cc80f3a611fd3d02965317284a0151e98e0177e182fe733ee47c
rook_csi_snapshotter 	ose-csi-external-snapshotter-rhel7@sha256:bd81f802e9abc7869f6967828a304e84fa6a34f62ccbe96be3fdd8bf8eb143cb


Automation run result : https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/12494/testReport/

Must-gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-sep17/jijoy-sep17_20200917T052511/logs/failed_testcase_ocs_logs_1600369381/test_snapshot_at_different_usage_level_ocs_logs/

Test case logs : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-sep17/jijoy-sep17_20200917T052511/logs/ocs-ci-logs-1600369381/tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py/TestSnapshotAtDifferentPvcUsageLevel/test_snapshot_at_different_usage_level/

ocs-ci test case : tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py
This test case was executed from https://github.com/red-hat-storage/ocs-ci/pull/2965

Comment 11 Mudit Agarwal 2020-09-18 03:52:11 UTC

Yug, PTAL

Comment 16 Mudit Agarwal 2020-09-18 09:10:35 UTC

Thanks Yug, this means that the used ceph version doesn't have the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1848494

If I remember correctly, the fix is supposed to land in OCS with RHCS 4.1z2 but we are not yet pointing to that.
Shyam, please correct me if I am wrong.

Comment 19 Humble Chirammal 2020-09-18 10:54:47 UTC

Fine, from c#15 and followup comments, Its clear that, we are missing the "--retain-snaps" in the build which is causing this behaviour. 

We have to see how to get the fixed RHCS build here and from then it should work as expected. 

Boris, do we know when the latest build of RHCS 4.1.z2 will be consumed by the OCS builds ? especially we are looking for the build which got https://bugzilla.redhat.com/show_bug.cgi?id=1848494 fix in it.

Comment 20 Mudit Agarwal 2020-09-21 02:50:24 UTC

Once we have OCS4.6 pointing to RHCS 4.1.z2 we should not see this issue.

Comment 26 Mudit Agarwal 2020-09-30 05:50:40 UTC

Moving it to ON_QA, latest 4.6 should have the reqired RHCS image.

Comment 27 Jilju Joy 2020-10-09 09:46:44 UTC

The below ocs-ci test covers cephfs parent PVC deletion when snaphots are present. The snapshots will be restored to new PVCs after deleting the parent PVC.

tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level

Test case passed in version:

OCS operator 	v4.6.0-113.ci
Ceph Version 	14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable)
Cluster Version 	4.6.0-0.nightly-2020-10-08-210814
rook_csi_ceph 	cephcsi@sha256:3b2fff211845eab398d66262a4c47eb5eadbcd982de80387aa47dd23f6572b22
rook_csi_snapshotter 	ose-csi-external-snapshotter@sha256:0359271dc35325385c9be9a5b353cbbc870998aa17d3542ab920acc3a9d59273


Hi Yug,

Should I test some other scenario before marking this bug as verified ?

Comment 28 Yug Gupta 2020-10-09 10:05:45 UTC

Thanks Jilju,

The following steps should verify the snapshot retention feature:
- Create PVC and app
- Create Snapshot
- Delete the PVC and app
- Create a PVC from snapshot
- Delete the PVC and app
- Delete the snapshot

Since the test validates these, I think it should be fine.

Comment 29 Jilju Joy 2020-10-12 15:41:00 UTC

(In reply to Yug Gupta from comment #28)
> Thanks Jilju,
> 
> The following steps should verify the snapshot retention feature:
> - Create PVC and app
> - Create Snapshot
> - Delete the PVC and app
> - Create a PVC from snapshot
> - Delete the PVC and app
> - Delete the snapshot
> 
> Since the test validates these, I think it should be fine.

Thanks Yug.

Comment 30 Jilju Joy 2020-10-12 15:44:54 UTC

Adding AutomationTriaged keyword.
Test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level

Comment 32 errata-xmlrpc 2020-12-17 06:22:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Note You need to log in before you can comment on or make changes to this bug.