Bug 1870334 - Ceph-CSI should allow providing a different size for creating Clone PVC/restore PVC
Summary: Ceph-CSI should allow providing a different size for creating Clone PVC/resto...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Yug Gupta
QA Contact: Jilju Joy
URL:
Whiteboard:
Depends On:
Blocks: 1882359
TreeView+ depends on / blocked
 
Reported: 2020-08-19 19:15 UTC by Neha Berry
Modified: 2023-08-09 16:37 UTC (History)
12 users (show)

Fixed In Version: 4.10.0-113
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-01 13:17:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-csi pull 2716 0 None Merged rbd: add support for bigger size restore/clone PVC 2022-01-31 07:42:24 UTC

Description Neha Berry 2020-08-19 19:15:53 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
While discussing Clone for CSI, it was seen that even though kubernetes allows providing a size greater than the source PVC size, but ceph CSI doesn't support it yet for OCS 4.6

Hence, in case a user still provides a different size during CLone creation(from UI or CLI), the PVC stays in Pending state and retries keep happening.

For Blocking this from UI -  Raised Bug 1870331

But it would be good to block the creation from CLI as well, so that the endless loop for creation wouldn't happen

Though it is an expected behavior due to current Clone design for ceph CSI, but it would be good to provide a way to block it.


Current Observation:
-----------------------
If we create a clone via CLI/UI with a size different than source PVC, the Clone PVC stays in Pending state and retries keep happening in the logs

  Warning  ProvisioningFailed    3m30s (x14 over 27m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-58ffd4559d-pqdb6_e2222a36-5810-4ccd-b322-6c6c336444e2  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = size missmatch, requested volume size 21474836480 and source volume size 10737418240
  Normal   ExternalProvisioning  119s (x105 over 27m)  persistentvolume-controller     


Version of all relevant components (if applicable):
----------------------------------------------------------------------

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-18-165040   True        False         4h5m    Cluster version is 4.6.0-0.nightly-2020-08-18-165040

$ oc get csv -n openshift-storage
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.6.0-533.ci   OpenShift Container Storage   4.6.0-533.ci              Succeeded

sh-4.4# ceph version
ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)
sh-4.4# 



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
Yes the Clone PVC is in Pending state and endless retries are happening

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------

2

Can this issue reproducible?
----------------------------------------------------------------------

Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
Yes

If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
No

Steps to Reproduce:
----------------------------------------------------------------------
1. Create an OCS + OCP 4.6 cluster on AWS/Vmware
2. Create a PVC from UI  : Storage->PersistentVolumeClaims-> Create Persistent Volume Claim
3. once it is Bound, either create a Clone PVC from UI or CLI by providing a size different than Source volume

4. ClI yaml

$ cat clone.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-clone-cli
  Namespace: test-clone
spec:
  storageClassName: ocs-storagecluster-ceph-rbd 
  dataSource:
    name: test-pvc
    kind: PersistentVolumeClaim 
    apiGroup: ""
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi # NOTE this capacity must be specified and must be >= the capacity of the source volume


Actual results:
----------------------------------------------------------------------
UI/CLI allows to provide a different size for the Clone PVC and the resulting Clone PVC is in Pending state

$ oc get pvc
NAME             STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
test-clone-cli   Pending                                                                        ocs-storagecluster-ceph-rbd   9m21s
test-pvc         Bound     pvc-222051c3-68ed-4650-8142-856f4f5af361   10Gi       RWO            ocs-storagecluster-ceph-rbd   39m
test-pvc-clone   Pending                                                                        ocs-storagecluster-ceph-rbd   36m



Expected results:
----------------------------------------------------------------------
UI and CLI both should handle this more gracefully to not allow providing a different size for Clone PVC. Or atleast an error message should be displayed.


Additional info:
----------------------------------------------------------------------

$ oc get pvc
NAME             STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
test-clone-cli   Pending                                                                        ocs-storagecluster-ceph-rbd   9m21s
test-pvc         Bound     pvc-222051c3-68ed-4650-8142-856f4f5af361   10Gi       RWO            ocs-storagecluster-ceph-rbd   39m
test-pvc-clone   Pending                                                                        ocs-storagecluster-ceph-rbd   36m


$ oc describe pvc test-clone-cli
Name:          test-clone-cli
Namespace:     test-clone
StorageClass:  ocs-storagecluster-ceph-rbd
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
DataSource:
  APIGroup:  
  Kind:      PersistentVolumeClaim
  Name:      test-pvc
Mounted By:  <none>
Events:
  Type     Reason                Age                   From                                                                                                                Message
  ----     ------                ----                  ----                                                                                                                -------
  Normal   Provisioning          2m13s (x11 over 10m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-58ffd4559d-pqdb6_e2222a36-5810-4ccd-b322-6c6c336444e2  External provisioner is provisioning volume for claim "test-clone/test-clone-cli"
  Warning  ProvisioningFailed    2m13s (x11 over 10m)  openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-58ffd4559d-pqdb6_e2222a36-5810-4ccd-b322-6c6c336444e2  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = size missmatch, requested volume size 21474836480 and source volume size 10737418240
  Normal   ExternalProvisioning  40s (x43 over 10m)    persistentvolume-controller                                                                                         waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator

Comment 4 Humble Chirammal 2020-08-20 03:24:06 UTC
Neha, thanks for opening this bug as a followup for our discussion.

I would like to share couple of thoughts here:

First of all, this scenario was communicated ( we will be supporting same size restore)  since EPIC readout. This will be documented as well. We had also provided this requirement to UI team since start and the only reason for them to not enable till now was, the code is general one for OCP and OCS drivers.

We will check on what extra we can do here from CSI driver:

But, I would like to request couple of experiments from your end:

1) Can you try the same for CephFS now with OCS 4.6 build  ? -> it should work
2) Can you also try editing or patching the PVC to match the size of parent PVC when its pending and while you have this error ?

Comment 5 Neha Berry 2020-08-20 11:35:55 UTC
(In reply to Humble Chirammal from comment #4)
> Neha, thanks for opening this bug as a followup for our discussion.
> 
> I would like to share couple of thoughts here:
> 
> First of all, this scenario was communicated ( we will be supporting same
> size restore)  since EPIC readout. This will be documented as well. We had
> also provided this requirement to UI team since start and the only reason
> for them to not enable till now was, the code is general one for OCP and OCS
> drivers.
> 
> We will check on what extra we can do here from CSI driver:
> 
> But, I would like to request couple of experiments from your end:
> 
> 1) Can you try the same for CephFS now with OCS 4.6 build  ? -> it should
> work
> 2) Can you also try editing or patching the PVC to match the size of parent
> PVC when its pending and while you have this error ?

I can try it in a day or two. thanks

Comment 6 Elena Bondarenko 2020-08-24 13:11:27 UTC
I tried cloning with a bigger size than original for CephFS today, it worked without any issues.

Comment 7 Jilju Joy 2020-08-26 13:36:22 UTC
(In reply to Humble Chirammal from comment #4)
> Neha, thanks for opening this bug as a followup for our discussion.
> 
> I would like to share couple of thoughts here:
> 
> First of all, this scenario was communicated ( we will be supporting same
> size restore)  since EPIC readout. This will be documented as well. We had
> also provided this requirement to UI team since start and the only reason
> for them to not enable till now was, the code is general one for OCP and OCS
> drivers.
> 
> We will check on what extra we can do here from CSI driver:
> 
> But, I would like to request couple of experiments from your end:
> 
> 1) Can you try the same for CephFS now with OCS 4.6 build  ? -> it should
> work
As per #comment6, this works.
> 2) Can you also try editing or patching the PVC to match the size of parent
> PVC when its pending and while you have this error ?
For pending rbd PVC, editing the requested storage does not work.

$ oc patch PersistentVolumeClaim test-clone-pvc-clone -p '{"spec": {"resources": {"requests": {"storage": "5Gi"}}}}'
The PersistentVolumeClaim "test-clone-pvc-clone" is invalid: 
* spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
  core.PersistentVolumeClaimSpec{
  	AccessModes: []core.PersistentVolumeAccessMode{"ReadWriteOnce"},
  	Selector:    nil,
  	Resources: core.ResourceRequirements{
  		Limits: nil,
- 		Requests: core.ResourceList{
- 			s"storage": {i: resource.int64Amount{value: 5368709120}, s: "5Gi", Format: "BinarySI"},
- 		},
+ 		Requests: core.ResourceList{
+ 			s"storage": {i: resource.int64Amount{value: 10737418240}, s: "10Gi", Format: "BinarySI"},
+ 		},
  	},
  	VolumeName:       "",
  	StorageClassName: &"ocs-storagecluster-ceph-rbd",
  	... // 2 identical fields
  }

* spec.resources.requests.storage: Forbidden: field can not be less than previous value

Comment 8 Humble Chirammal 2020-08-26 13:49:14 UTC
Thanks Jilju, yeah so we are getting expected result. :). Its blocked to have a lesser value once we have the size, thats what the actual blocker for recovering from failed resize too. however wanted to double confirm this at very first provisioning. 

The CephFS result is actually temporary till we have a core fix in CephFS on size of clone attribute. Once its available we will be disabling it just to make sure we have parity between our RBD and CephFS driver.

We will also explore if we could support this for both. But, I dont want to introduce any confusion here or our current plan is to stick with the requirement listed or agreed at start of the feature planning which is nothing but we wont be allowing volume restore/clone on bigger size.

Comment 9 Mudit Agarwal 2020-09-29 09:19:49 UTC
Humble, have we reached on some consensus here?
Should we close this as WONT_FIX or move to next release if there is some plan to do this in future.

Comment 11 Humble Chirammal 2020-09-29 10:04:12 UTC
Targetting this to OCS 4.7 or beyond as we need to get the support from the upstream kube side.

Comment 12 Mudit Agarwal 2020-11-04 15:29:53 UTC
Madhu, should we add doc_text for this know issue?

Comment 18 Mudit Agarwal 2021-09-07 10:56:05 UTC
Still dependent on upstream kubernetes changes, can't be fixed in 4.9

Comment 19 Niels de Vos 2021-11-15 13:46:55 UTC
Yug, it seems that the Cinder CSI-driver does a resize in NodeStageVolume() now. This looks like something that we can do as well.

See https://github.com/kubernetes/cloud-provider-openstack/pull/1563 for details.

Comment 27 Mudit Agarwal 2022-02-01 13:17:11 UTC
Sure.


Note You need to log in before you can comment on or make changes to this bug.